summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* percpu-rwsem: kill CONFIG_PERCPU_RWSEMOleg Nesterov2015-08-154-7/+1
| | | | | | | Remove CONFIG_PERCPU_RWSEM, the next patch adds the unconditional user of percpu_rw_semaphore. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
* percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()Oleg Nesterov2015-08-151-0/+19
| | | | | | | | | | | | Add percpu_rwsem_release() and percpu_rwsem_acquire() for the users which need to return to userspace with percpu-rwsem lock held and/or pass the ownership to another thread. TODO: change percpu_rwsem_release() to use rwsem_clear_owner(). We can either fold kernel/locking/rwsem.h into include/linux/rwsem.h, or add the non-inline percpu_rwsem_clear_owner(). Signed-off-by: Oleg Nesterov <oleg@redhat.com>
* percpu-rwsem: introduce percpu_down_read_trylock()Oleg Nesterov2015-08-152-0/+14
| | | | | | Add percpu_down_read_trylock(), it will have the user soon. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
* document rwsem_release() in sb_wait_write()Oleg Nesterov2015-08-151-3/+9
| | | | | | | | | | | Not only we need to avoid the warning from lockdep_sys_exit(), the caller of freeze_super() can never release this lock. Another thread can do this, so there is another reason for rwsem_release(). Plus the comment should explain why we have to fool lockdep. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jan Kara <jack@suse.com>
* fix the broken lockdep logic in __sb_start_write()Oleg Nesterov2015-08-151-33/+40
| | | | | | | | | | | | | | | | | | | | | | | | | 1. wait_event(frozen < level) without rwsem_acquire_read() is just wrong from lockdep perspective. If we are going to deadlock because the caller is buggy, lockdep can't detect this problem. 2. __sb_start_write() can race with thaw_super() + freeze_super(), and after "goto retry" the 2nd acquire_freeze_lock() is wrong. 3. The "tell lockdep we are doing trylock" hack doesn't look nice. I think this is correct, but this logic should be more explicit. Yes, the recursive read_lock() is fine if we hold the lock on a higher level. But we do not need to fool lockdep. If we can not deadlock in this case then try-lock must not fail and we can use use wait == F throughout this code. Note: as Dave Chinner explains, the "trylock" hack and the fat comment can be probably removed. But this needs a separate change and it will be trivial: just kill __sb_start_write() and rename do_sb_start_write() back to __sb_start_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jan Kara <jack@suse.com>
* introduce __sb_writers_{acquired,release}() helpersOleg Nesterov2015-08-153-10/+9
| | | | | | | | | Preparation to hide the sb->s_writers internals from xfs and btrfs. Add 2 trivial define's they can use rather than play with ->s_writers directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jan Kara <jack@suse.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-08-151-1/+6
|\ | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "Just two very small & simple patches" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST KVM: x86: zero IDT limit on entry to SMM
| * KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUSTHaozhong Zhang2015-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kvm_set_msr_common() handles a guest's write to MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data written by guest and then use it to adjust TSC offset by calling a call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset() indicates whether the adjustment is in host TSC cycles or in guest TSC cycles. If SVM TSC scaling is enabled, adjust_tsc_offset() [i.e. svm_adjust_tsc_offset()] will first scale the adjustment; otherwise, it will just use the unscaled one. As the MSR write here comes from the guest, the adjustment is in guest TSC cycles. However, the current kvm_set_msr_common() uses it as a value in host TSC cycles (by using true as the 3rd parameter of adjust_tsc_offset()), which can result in an incorrect adjustment of TSC offset if SVM TSC scaling is enabled. This patch fixes this problem. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Cc: stable@vger.linux.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: zero IDT limit on entry to SMMPaolo Bonzini2015-08-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | The recent BlackHat 2015 presentation "The Memory Sinkhole" mentions that the IDT limit is zeroed on entry to SMM. This is not documented, and must have changed some time after 2010 (see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf). KVM was not doing it, but the fix is easy. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2015-08-1510-28/+74
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: Update maintainers for DRM STI driver mm: cma: mark cma_bitmap_maxno() inline in header zram: fix pool name truncation memory-hotplug: fix wrong edge when hot add a new node .mailmap: Andrey Ryabinin has moved ipc/sem.c: update/correct memory barriers mm/hwpoison: fix panic due to split huge zero page ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem() ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held mm/hwpoison: fix page refcount of unknown non LRU page
| * | Update maintainers for DRM STI driverBenjamin Gaignard2015-08-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Vincent Abriou and myself as maintainers. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Cc: Vincent Abriou <vincent.abriou@st.com> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | mm: cma: mark cma_bitmap_maxno() inline in headerGregory Fong2015-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cma_bitmap_maxno() was marked as static and not static inline, which can cause warnings about this function not being used if this file is included in a file that does not call that function, and violates the conventions used elsewhere. The two options are to move the function implementation back to mm/cma.c or make it inline here, and it's simple enough for the latter to make sense. Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | zram: fix pool name truncationSergey Senozhatsky2015-08-151-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zram_meta_alloc() constructs a pool name for zs_create_pool() call as snprintf(pool_name, sizeof(pool_name), "zram%d", device_id); However, it defines pool name buffer to be only 8 bytes long (minus trailing zero), which means that we can have only 1000 pool names: zram0 -- zram999. With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000 can fail if device zram100 already exists, because snprintf() will truncate new pool name to zram100 and pass it debugfs_create_dir(), causing: debugfs dir <zram100> creation failed zram: Error creating memory pool ... and so on. Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of divice_id. We construct zram%d name earlier and keep it as a ->disk_name, no need to snprintf() it again. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | memory-hotplug: fix wrong edge when hot add a new nodeXishi Qiu2015-08-152-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we add a new node, the edge of memory may be wrong. e.g. system has 4 nodes, and node3 is movable, node3 mem:[24G-32G], 1. hotremove the node3, 2. then hotadd node3 with a part of memory, mem:[26G-30G], 3. call hotadd_new_pgdat() free_area_init_node() get_pfn_range_for_nid() 4. it will return wrong start_pfn and end_pfn, because we have not update the memblock. This patch also fixes a BUG_ON during hot-addition, please see http://marc.info/?l=linux-kernel&m=142961156129456&w=2 Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | .mailmap: Andrey Ryabinin has movedAndrey Ryabinin2015-08-153-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | Update my email address. Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | ipc/sem.c: update/correct memory barriersManfred Spraul2015-08-151-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sem_lock() did not properly pair memory barriers: !spin_is_locked() and spin_unlock_wait() are both only control barriers. The code needs an acquire barrier, otherwise the cpu might perform read operations before the lock test. As no primitive exists inside <include/spinlock.h> and since it seems noone wants another primitive, the code creates a local primitive within ipc/sem.c. With regards to -stable: The change of sem_wait_array() is a bugfix, the change to sem_lock() is a nop (just a preprocessor redefinition to improve the readability). The bugfix is necessary for all kernels that use sem_wait_array() (i.e.: starting from 3.10). Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | mm/hwpoison: fix panic due to split huge zero pageWanpeng Li2015-08-151-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bug: ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:1957! invalid opcode: 0000 [#1] SMP Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27 Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015 task: ffff880204e3d600 ti: ffff8800db16c000 task.ti: ffff8800db16c000 RIP: split_huge_page_to_list+0xdb/0x120 Call Trace: memory_failure+0x32e/0x7c0 madvise_hwpoison+0x8b/0x160 SyS_madvise+0x40/0x240 ? do_page_fault+0x37/0x90 entry_SYSCALL_64_fastpath+0x12/0x71 Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0 RIP split_huge_page_to_list+0xdb/0x120 RSP <ffff8800db16fde8> ---[ end trace aee7ce0df8e44076 ]--- Testcase: #define _GNU_SOURCE #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <errno.h> #include <string.h> #define MB 1024*1024 int main(void) { char *mem; posix_memalign((void **)&mem, 2 * MB, 200 * MB); madvise(mem, 200 * MB, MADV_HWPOISON); free(mem); return 0; } Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag. The get_user_pages_fast() which called in madvise_hwpoison() will get huge zero page if the page is not allocated before. Huge zero page is a tranparent huge page, however, it is not an anonymous page. memory_failure will split the huge zero page and trigger BUG_ON(is_huge_zero_page(page)); After commit 98ed2b0052e6 ("mm/memory-failure: give up error handling for non-tail-refcounted thp"), memory_failure will not catch non anon thp from madvise_hwpoison path and this bug occur. Fix it by catching non anon thp in memory_failure in order to not split huge zero page in madvise_hwpoison path. After this patch: Injecting memory failure for page 0x202800 at 0x7fd8ae800000 MCE: 0x202800: non anonymous thp [...] [akpm@linux-foundation.org: remove second split, per Wanpeng] Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()Herton R. Krzesinski2015-08-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we acquire the sma->sem_perm lock in exit_sem(), we are protected against a racing IPC_RMID operation. Also at that point, we are the last user of sem_undo_list. Therefore it isn't required that we acquire or use ulp->lock. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Rafael Aquini <aquini@redhat.com> CC: Aristeu Rozanski <aris@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | ipc,sem: fix use after free on IPC_RMID after a task using same semaphore ↵Herton R. Krzesinski2015-08-151-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set exits The current semaphore code allows a potential use after free: in exit_sem we may free the task's sem_undo_list while there is still another task looping through the same semaphore set and cleaning the sem_undo list at freeary function (the task called IPC_RMID for the same semaphore set). For example, with a test program [1] running which keeps forking a lot of processes (which then do a semop call with SEM_UNDO flag), and with the parent right after removing the semaphore set with IPC_RMID, and a kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and CONFIG_DEBUG_SPINLOCK, you can easily see something like the following in the kernel log: Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64 000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b kkkkkkkk.kkkkkkk 010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........ Prev obj: start=ffff88003b45c180, len=64 000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ 010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff ...........7.... Next obj: start=ffff88003b45c200, len=64 000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ 010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff ........h).<.... BUG: spinlock wrong CPU on CPU#2, test/18028 general protection fault: 0000 [#1] SMP Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib] CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 RIP: spin_dump+0x53/0xc0 Call Trace: spin_bug+0x30/0x40 do_raw_spin_unlock+0x71/0xa0 _raw_spin_unlock+0xe/0x10 freeary+0x82/0x2a0 ? _raw_spin_lock+0xe/0x10 semctl_down.clone.0+0xce/0x160 ? __do_page_fault+0x19a/0x430 ? __audit_syscall_entry+0xa8/0x100 SyS_semctl+0x236/0x2c0 ? syscall_trace_leave+0xde/0x130 entry_SYSCALL_64_fastpath+0x12/0x71 Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89 RIP [<ffffffff810d6053>] spin_dump+0x53/0xc0 RSP <ffff88003750fd68> ---[ end trace 783ebb76612867a0 ]--- NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053] Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib] CPU: 3 PID: 18053 Comm: test Tainted: G D 4.2.0-rc5+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 RIP: native_read_tsc+0x0/0x20 Call Trace: ? delay_tsc+0x40/0x70 __delay+0xf/0x20 do_raw_spin_lock+0x96/0x140 _raw_spin_lock+0xe/0x10 sem_lock_and_putref+0x11/0x70 SYSC_semtimedop+0x7bf/0x960 ? handle_mm_fault+0xbf6/0x1880 ? dequeue_task_fair+0x79/0x4a0 ? __do_page_fault+0x19a/0x430 ? kfree_debugcheck+0x16/0x40 ? __do_page_fault+0x19a/0x430 ? __audit_syscall_entry+0xa8/0x100 ? do_audit_syscall_entry+0x66/0x70 ? syscall_trace_enter_phase1+0x139/0x160 SyS_semtimedop+0xe/0x10 SyS_semop+0x10/0x20 entry_SYSCALL_64_fastpath+0x12/0x71 Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9 Kernel panic - not syncing: softlockup: hung tasks I wasn't able to trigger any badness on a recent kernel without the proper config debugs enabled, however I have softlockup reports on some kernel versions, in the semaphore code, which are similar as above (the scenario is seen on some servers running IBM DB2 which uses semaphore syscalls). The patch here fixes the race against freeary, by acquiring or waiting on the sem_undo_list lock as necessary (exit_sem can race with freeary, while freeary sets un->semid to -1 and removes the same sem_undo from list_proc or when it removes the last sem_undo). After the patch I'm unable to reproduce the problem using the test case [1]. [1] Test case used below: #include <stdio.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/sem.h> #include <sys/wait.h> #include <stdlib.h> #include <time.h> #include <unistd.h> #include <errno.h> #define NSEM 1 #define NSET 5 int sid[NSET]; void thread() { struct sembuf op; int s; uid_t pid = getuid(); s = rand() % NSET; op.sem_num = pid % NSEM; op.sem_op = 1; op.sem_flg = SEM_UNDO; semop(sid[s], &op, 1); exit(EXIT_SUCCESS); } void create_set() { int i, j; pid_t p; union { int val; struct semid_ds *buf; unsigned short int *array; struct seminfo *__buf; } un; /* Create and initialize semaphore set */ for (i = 0; i < NSET; i++) { sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT); if (sid[i] < 0) { perror("semget"); exit(EXIT_FAILURE); } } un.val = 0; for (i = 0; i < NSET; i++) { for (j = 0; j < NSEM; j++) { if (semctl(sid[i], j, SETVAL, un) < 0) perror("semctl"); } } /* Launch threads that operate on semaphore set */ for (i = 0; i < NSEM * NSET * NSET; i++) { p = fork(); if (p < 0) perror("fork"); if (p == 0) thread(); } /* Free semaphore set */ for (i = 0; i < NSET; i++) { if (semctl(sid[i], NSEM, IPC_RMID)) perror("IPC_RMID"); } /* Wait for forked processes to exit */ while (wait(NULL)) { if (errno == ECHILD) break; }; } int main(int argc, char **argv) { pid_t p; srand(time(NULL)); while (1) { p = fork(); if (p < 0) { perror("fork"); exit(EXIT_FAILURE); } if (p == 0) { create_set(); goto end; } /* Wait for forked processes to exit */ while (wait(NULL)) { if (errno == ECHILD) break; }; } end: return 0; } [akpm@linux-foundation.org: use normal comment layout] Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Rafael Aquini <aquini@redhat.com> CC: Aristeu Rozanski <aris@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount heldWanpeng Li2015-08-151-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hugetlbfs pages will get a refcount in get_any_page() or madvise_hwpoison() if soft offlining through madvise. The refcount which is held by the soft offline path should be released if we fail to isolate hugetlbfs pages. Fix it by reducing the refcount for both isolation success and failure. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> [3.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | mm/hwpoison: fix page refcount of unknown non LRU pageWanpeng Li2015-08-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After trying to drain pages from pagevec/pageset, we try to get reference count of the page again, however, the reference count of the page is not reduced if the page is still not on LRU list. Fix it by adding the put_page() to drop the page reference which is from __get_any_page(). Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> [3.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds2015-08-151-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clock fix from Stephen Boyd: "A one-liner for a regression found in the PXA clock driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: pxa: pxa3xx: fix CKEN register access
| * | | clk: pxa: pxa3xx: fix CKEN register accessRobert Jarzmik2015-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clocks 0 to 31 are on CKENA, and not CKENB. The clock register names were inadequately inverted. As a consequence, all clock operations were happening on CKENB, because almost all but 2 clocks are on CKENA. As the clocks were activated by the bootloader in the former tests, it escaped the testing that the wrong clock gate was manipulated. The error was revealed by changing the pxa3xx-nand driver to a module, where upon unloading, the wrong clock was disabled in CKENB. Fixes: 9bbb8a338fb2 ("clk: pxa: add pxa3xx clock driver") Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
* | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2015-08-141-0/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "A single clocksource driver suspend/resume fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabled
| * | | | clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabledGeert Uytterhoeven2015-08-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the sh_cmt clocksource timer is disabled or enabled unconditionally on clocksource suspend resp. resume, even if a better clocksource is present (e.g. arch_sys_counter) and the sh_cmt clocksource is not enabled. As sh_cmt is a syscore device when its timer is enabled, this may lead to a genpd.prepared_count imbalance in the presence of PM Domains, which may cause a lock-up during reboot after s2ram. During suspend: - pm_genpd_prepare() is called for all non-syscore devices (incl. sh_cmt), increasing genpd.prepared_count for each device, - clocksource.suspend() is called for all clocksource devices, - sh_cmt_clocksource_suspend() calls sh_cmt_stop(), which is a no-op as the clocksource was not enabled. During resume: - clocksource.resume() is called for all clocksource devices, - sh_cmt_clocksource_resume() calls sh_cmt_start(), which enables the clocksource timer, and turns sh_cmt into a syscore device, - pm_genpd_complete() is called for all non-syscore devices (excl. sh_cmt now!), decreasing genpd.prepared_count for each device but sh_cmt. Now genpd.prepared_count of the PM Domain containing sh_cmt is still 1 instead of zero. On subsequent suspend/resume cycles, sh_cmt is still a syscore device, hence it's skipped for pm_genpd_{prepare,complete}(), keeping the imbalance of genpd.prepared_count at 1. During reboot: - platform_drv_shutdown() is called for any platform device that has a driver with a .shutdown() method (only rcar-dmac on R-Car Gen2), - platform_drv_shutdown() calls dev_pm_domain_detach(), which calls genpd_dev_pm_detach(), - genpd_dev_pm_detach() keeps calling pm_genpd_remove_device() until it doesn't return -EAGAIN[*], - If the device is part of the same PM Domain as sh_cmt, pm_genpd_remove_device() always fails with -EAGAIN due to genpd.prepared_count > 0. - Infinite loop in genpd_dev_pm_detach()[*]. [*] Commit 93af5e9354432828 ("PM / Domains: Avoid infinite loops in attach/detach code") already limited the number of loop iterations, avoiding the lock-up. To fix this, only disable or enable the clocksource timer on clocksource suspend resp. resume if the clocksource was enabled. This was tested on r8a7791/koelsch with the CPG Clock Domain: - using arch_sys_counter as the clocksource, which is the default, and which showed the problem, - using sh_cmt as a clocksource ("echo ffca0000.timer > \ /sys/devices/system/clocksource/clocksource0/current_clocksource"), which behaves the same as before. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438875126-12596-2-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2015-08-146-46/+96
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: PMU driver corner cases, tooling fixes, and an 'AUX' (Intel PT) race related core fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler perf/x86/intel: Fix memory leak on hot-plug allocation fail perf: Fix PERF_EVENT_IOC_PERIOD migration race perf: Fix double-free of the AUX buffer perf: Fix fasync handling on inherited events perf tools: Fix test build error when bindir contains double slash perf stat: Fix transaction lenght metrics perf: Fix running time accounting
| * | | | | perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handlerMatt Fleming2015-08-121-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tony reports that booting his 144-cpu machine with maxcpus=10 triggers the following WARN_ON(): [ 21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90() [ 21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1 [ 21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.1506021730 06/02/2015 [ 21.045747] 0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67 [ 21.045748] 0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a [ 21.045750] ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a [ 21.045750] Call Trace: [ 21.045757] [<ffffffff81669b67>] dump_stack+0x45/0x57 [ 21.045759] [<ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0 [ 21.045761] [<ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20 [ 21.045762] [<ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90 [ 21.045764] [<ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160 [ 21.045767] [<ffffffff8109a33d>] notifier_call_chain+0x4d/0x80 [ 21.045769] [<ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10 [ 21.045770] [<ffffffff8107b538>] _cpu_up+0xe8/0x190 [ 21.045771] [<ffffffff8107b65a>] cpu_up+0x7a/0xa0 [ 21.045774] [<ffffffff8165e920>] cpu_subsys_online+0x40/0x90 [ 21.045777] [<ffffffff81433b37>] device_online+0x67/0x90 [ 21.045778] [<ffffffff81433bea>] online_store+0x8a/0xa0 [ 21.045782] [<ffffffff81430e78>] dev_attr_store+0x18/0x30 [ 21.045785] [<ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50 [ 21.045786] [<ffffffff8126ad40>] kernfs_fop_write+0x120/0x170 [ 21.045789] [<ffffffff811f0b77>] __vfs_write+0x37/0x100 [ 21.045791] [<ffffffff811f38b8>] ? __sb_start_write+0x58/0x110 [ 21.045795] [<ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0 [ 21.045796] [<ffffffff811f1279>] vfs_write+0xa9/0x190 [ 21.045797] [<ffffffff811f2075>] SyS_write+0x55/0xc0 [ 21.045800] [<ffffffff81067300>] ? do_page_fault+0x30/0x80 [ 21.045804] [<ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 21.045805] ---[ end trace fe228b836d8af405 ]--- The root cause is that CPU_UP_PREPARE is completely the wrong notifier action from which to access cpu_data(), because smp_store_cpu_info() won't have been executed by the target CPU at that point, which in turn means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been filled out. Instead let's invoke our handler from CPU_STARTING and rename it appropriately. Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | perf/x86/intel: Fix memory leak on hot-plug allocation failPeter Zijlstra2015-08-121-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We fail to free the shared_regs allocation if the constraint_list allocation fails. Cure this and be more consistent in NULL-ing the pointers after free. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | perf: Fix PERF_EVENT_IOC_PERIOD migration racePeter Zijlstra2015-08-121-20/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I ran the perf fuzzer, which triggered some WARN()s which are due to trying to stop/restart an event on the wrong CPU. Use the normal IPI pattern to ensure we run the code on the correct CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: bad7192b842c ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period") Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | perf: Fix double-free of the AUX bufferBen Hutchings2015-08-121-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If rb->aux_refcount is decremented to zero before rb->refcount, __rb_free_aux() may be called twice resulting in a double free of rb->aux_pages. Fix this by adding a check to __rb_free_aux(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 57ffc5ca679f ("perf: Fix AUX buffer refcounting") Link: http://lkml.kernel.org/r/1437953468.12842.17.camel@decadent.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | perf: Fix fasync handling on inherited eventsPeter Zijlstra2015-08-041-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vince reported that the fasync signal stuff doesn't work proper for inherited events. So fix that. Installing fasync allocates memory and sets filp->f_flags |= FASYNC, which upon the demise of the file descriptor ensures the allocation is freed and state is updated. Now for perf, we can have the events stick around for a while after the original FD is dead because of references from child events. So we cannot copy the fasync pointer around. We can however consistently use the parent's fasync, as that will be updated. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho deMelo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2015-07-312-6/+4
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix 'perf stat' transaction length metrics. (Andi Kleen) - Fix test build error when bindir contains double slash. (Pawel Moll) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | perf tools: Fix test build error when bindir contains double slashPawel Moll2015-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building with a prefix ending with a slash, for example: $ make prefix=/usr/local/ one of the perf tests fail to compile due to BUILD_STR macro mishandling bindir_SQ string containing with two slashes: -DBINDIR="BUILD_STR(/usr/local//bin)" with the following error: CC tests/attr.o tests/attr.c: In function ‘test__attr’: tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR); ^ tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token } ^ tests/attr.c:176:1: error: control reaches end of non-void function [-Werror=return-type] } ^ cc1: all warnings being treated as errors This patch works around the problem by "cleaning" the bindir string using make's abspath function. Signed-off-by: Pawel Moll <pawel.moll@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1438092613-21014-1-git-send-email-pawel.moll@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | perf stat: Fix transaction lenght metricsAndi Kleen2015-07-281-5/+3
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The transaction length metrics in perf stat -T broke recently. It would not match the metric correctly and always print K/sec. This was caused by a incorrect update of the cycles_in_tx statistics. Update the correct variable. Also the check for zero division was reversed, which resulted in K/sec being printed for no transactions. Fix this also up. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1438039491-22091-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf: Fix running time accountingPeter Zijlstra2015-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent fix to the shadow timestamp inadvertly broke the running time accounting. We must not update the running timestamp if we fail to schedule the event, the event will not have ran. This can (and did) result in negative total runtime because the stopped timestamp was before the running timestamp (we 'started' but never stopped the event -- because it never really started we didn't have to stop it either). Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Fixes: 72f669c0086f ("perf: Update shadow timestamp before add event") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org # 4.1 Cc: Shaohua Li <shli@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2015-08-141-1/+10
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "A single fix for a locking self-test crash" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/pvqspinlock: Fix kernel panic in locking-selftest
| * | | | | | locking/pvqspinlock: Fix kernel panic in locking-selftestWaiman Long2015-07-211-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling locking-selftest in a VM guest may cause the following kernel panic: kernel BUG at .../kernel/locking/qspinlock_paravirt.h:137! This is due to the fact that the pvqspinlock unlock function is expecting either a _Q_LOCKED_VAL or _Q_SLOW_VAL in the lock byte. This patch prevents that bug report by ignoring it when debug_locks_silent is set. Otherwise, a warning will be printed if it contains an unexpected value. With this patch applied, the kernel locking-selftest completed without any noise. Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1436663959-53092-1-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2015-08-146-36/+32
|\ \ \ \ \ \ \ | |_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Back from holidays, found these in the cracks: one nouveau revert, one vmwgfx locking fix and a bunch of exynos fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them" drm/vmwgfx: Fix execbuf locking issues drm/exynos/fimc: fix runtime pm support drm/exynos/mixer: always update INT_EN cache drm/exynos/mixer: correct vsync configuration sequence drm/exynos/mixer: fix interrupt clearing drm/exynos/hdmi: fix edid memory leak drm/exynos: gsc: fix wrong bitwise operation for swap detection
| * | | | | | Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them"Alexandre Courbot2015-08-141-21/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 1addc1264852 This commit seems to cause crashes in gk104_fifo_intr_runlist() by returning 0xbad0da00 when register 0x2a00 is read. Since this commit was intended for GM20B which is not completely supported yet, let's revert it for the time being. Reported-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Afzal Mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | | | drm/vmwgfx: Fix execbuf locking issuesThomas Hellstrom2015-08-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This addresses two issues that cause problems with viewperf maya-03 in situation with memory pressure. The first issue causes attempts to unreserve buffers if batched reservation fails due to, for example, a signal pending. While previously the ttm_eu api was resistant against this type of error, it is no longer and the lockdep code will complain about attempting to unreserve buffers that are not reserved. The issue is resolved by avoid calling ttm_eu_backoff_reservation in the buffer reserve error path. The second issue is that the binding_mutex may be held when user-space fence objects are created and hence during memory reclaims. This may cause recursive attempts to grab the binding mutex. The issue is resolved by not holding the binding mutex across fence creation and submission. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Cc: <stable@vger.kernel.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | | | Merge branch 'exynos-drm-fixes' of ↵Dave Airlie2015-08-144-13/+22
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes This pull request fixes memory leak and some issues related to mixer and gscaler driver issues. * 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos: drm/exynos/fimc: fix runtime pm support drm/exynos/mixer: always update INT_EN cache drm/exynos/mixer: correct vsync configuration sequence drm/exynos/mixer: fix interrupt clearing drm/exynos/hdmi: fix edid memory leak drm/exynos: gsc: fix wrong bitwise operation for swap detection
| | * | | | | | drm/exynos/fimc: fix runtime pm supportMarek Szyprowski2015-08-111-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once pm_runtime_set_active() gets called, the kernel assumes that given device has already enabled runtime pm and will call pm_runtime_suspend() without matching pm_runtime_resume(). In case of DRM FIMC IPP driver, this will result in calling clk_disable() without respective call to clk_enable(). This patch removes call to pm_runtime_set_active() to ensure that pm_runtime_suspend/resume calls will match. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| | * | | | | | drm/exynos/mixer: always update INT_EN cacheAndrzej Hajda2015-08-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | INT_EN cache field was updated only by mixer_enable_vblank. The patch adds update also by mixer_disable_vblank function. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| | * | | | | | drm/exynos/mixer: correct vsync configuration sequenceAndrzej Hajda2015-08-111-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Specification advises to clear vsync indicator before configuring vsync. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| | * | | | | | drm/exynos/mixer: fix interrupt clearingAndrzej Hajda2015-08-111-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver used incorrect flags to clear interrupt status. The patch fixes it. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| | * | | | | | drm/exynos/hdmi: fix edid memory leakAndrzej Hajda2015-08-111-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | edid returned by drm_get_edid should be freed. The patch fixes it. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| | * | | | | | drm/exynos: gsc: fix wrong bitwise operation for swap detectionHyungwon Hwang2015-08-111-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bits for rotation are not used as exclusively. So GSC_IN_ROT_270 can not be used for swap detection. The definition of it is same with GSC_IN_ROT_MASK. It is enough to check GSC_IN_ROT_90 bit is set or not to check whether width / height size swapping is needed. Signed-off-by: Hyungwon Hwang <human.hwang@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
* | | | | | | | Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds2015-08-144-5/+8
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM fixes from Russell King: "Another few small ARM fixes, mostly addressing some VDSO issues" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8410/1: VDSO: fix coarse clock monotonicity regression ARM: 8409/1: Mark ret_fast_syscall as a function ARM: 8408/1: Fix the secondary_startup function in Big Endian case ARM: 8405/1: VDSO: fix regression with toolchains lacking ld.bfd executable
| * | | | | | | | ARM: 8410/1: VDSO: fix coarse clock monotonicity regressionNathan Lynch2015-08-111-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the real timekeeper last") it has become possible on ARM to: - Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp via syscall. - Subsequently obtain a timestamp for the same clock ID via VDSO which predates the first timestamp (by one jiffy). This is because ARM's update_vsyscall is deriving the coarse time using the __current_kernel_time interface, when it should really be using the timekeeper object provided to it by the timekeeping core. It happened to work before only because __current_kernel_time would access the same timekeeper object which had been passed to update_vsyscall. This is no longer the case. Cc: stable@vger.kernel.org Fixes: 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the real timekeeper last") Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | | | | ARM: 8409/1: Mark ret_fast_syscall as a functionDrew Richardson2015-08-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ret_fast_syscall runs when user space makes a syscall. However it needs to be marked as such so the ELF information is correct. Before it was: 101: 8000f300 0 NOTYPE LOCAL DEFAULT 2 ret_fast_syscall But with this change it correctly shows as: 101: 8000f300 96 FUNC LOCAL DEFAULT 2 ret_fast_syscall I see this function when using perf to unwind call stacks from kernel space to user space. Without this change I would need to add some special case logic when using the vmlinux ELF information. Signed-off-by: Drew Richardson <drew.richardson@arm.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>