| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Remove CONFIG_PERCPU_RWSEM, the next patch adds the unconditional
user of percpu_rw_semaphore.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add percpu_rwsem_release() and percpu_rwsem_acquire() for the users
which need to return to userspace with percpu-rwsem lock held and/or
pass the ownership to another thread.
TODO: change percpu_rwsem_release() to use rwsem_clear_owner(). We can
either fold kernel/locking/rwsem.h into include/linux/rwsem.h, or add
the non-inline percpu_rwsem_clear_owner().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|
|
|
|
|
|
| |
Add percpu_down_read_trylock(), it will have the user soon.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Not only we need to avoid the warning from lockdep_sys_exit(), the
caller of freeze_super() can never release this lock. Another thread
can do this, so there is another reason for rwsem_release().
Plus the comment should explain why we have to fool lockdep.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. wait_event(frozen < level) without rwsem_acquire_read() is just
wrong from lockdep perspective. If we are going to deadlock
because the caller is buggy, lockdep can't detect this problem.
2. __sb_start_write() can race with thaw_super() + freeze_super(),
and after "goto retry" the 2nd acquire_freeze_lock() is wrong.
3. The "tell lockdep we are doing trylock" hack doesn't look nice.
I think this is correct, but this logic should be more explicit.
Yes, the recursive read_lock() is fine if we hold the lock on a
higher level. But we do not need to fool lockdep. If we can not
deadlock in this case then try-lock must not fail and we can use
use wait == F throughout this code.
Note: as Dave Chinner explains, the "trylock" hack and the fat comment
can be probably removed. But this needs a separate change and it will
be trivial: just kill __sb_start_write() and rename do_sb_start_write()
back to __sb_start_write().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
|
|
|
|
|
|
|
|
|
| |
Preparation to hide the sb->s_writers internals from xfs and btrfs.
Add 2 trivial define's they can use rather than play with ->s_writers
directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| | |
Pull KVM fixes from Paolo Bonzini:
"Just two very small & simple patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST
KVM: x86: zero IDT limit on entry to SMM
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When kvm_set_msr_common() handles a guest's write to
MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data
written by guest and then use it to adjust TSC offset by calling a
call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset()
indicates whether the adjustment is in host TSC cycles or in guest TSC
cycles. If SVM TSC scaling is enabled, adjust_tsc_offset()
[i.e. svm_adjust_tsc_offset()] will first scale the adjustment;
otherwise, it will just use the unscaled one. As the MSR write here
comes from the guest, the adjustment is in guest TSC cycles. However,
the current kvm_set_msr_common() uses it as a value in host TSC
cycles (by using true as the 3rd parameter of adjust_tsc_offset()),
which can result in an incorrect adjustment of TSC offset if SVM TSC
scaling is enabled. This patch fixes this problem.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: stable@vger.linux.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The recent BlackHat 2015 presentation "The Memory Sinkhole"
mentions that the IDT limit is zeroed on entry to SMM.
This is not documented, and must have changed some time after 2010
(see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf).
KVM was not doing it, but the fix is easy.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Merge fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
Update maintainers for DRM STI driver
mm: cma: mark cma_bitmap_maxno() inline in header
zram: fix pool name truncation
memory-hotplug: fix wrong edge when hot add a new node
.mailmap: Andrey Ryabinin has moved
ipc/sem.c: update/correct memory barriers
mm/hwpoison: fix panic due to split huge zero page
ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
mm/hwpoison: fix page refcount of unknown non LRU page
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add Vincent Abriou and myself as maintainers.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Vincent Abriou <vincent.abriou@st.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
cma_bitmap_maxno() was marked as static and not static inline, which can
cause warnings about this function not being used if this file is included
in a file that does not call that function, and violates the conventions
used elsewhere. The two options are to move the function implementation
back to mm/cma.c or make it inline here, and it's simple enough for the
latter to make sense.
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
zram_meta_alloc() constructs a pool name for zs_create_pool() call as
snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);
However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.
With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:
debugfs dir <zram100> creation failed
zram: Error creating memory pool
... and so on.
Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id. We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When we add a new node, the edge of memory may be wrong.
e.g. system has 4 nodes, and node3 is movable, node3 mem:[24G-32G],
1. hotremove the node3,
2. then hotadd node3 with a part of memory, mem:[26G-30G],
3. call hotadd_new_pgdat()
free_area_init_node()
get_pfn_range_for_nid()
4. it will return wrong start_pfn and end_pfn, because we have not
update the memblock.
This patch also fixes a BUG_ON during hot-addition, please see
http://marc.info/?l=linux-kernel&m=142961156129456&w=2
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Update my email address.
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sem_lock() did not properly pair memory barriers:
!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.
As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.
With regards to -stable:
The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability). The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Bug:
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:1957!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re
CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27
Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
task: ffff880204e3d600 ti: ffff8800db16c000 task.ti: ffff8800db16c000
RIP: split_huge_page_to_list+0xdb/0x120
Call Trace:
memory_failure+0x32e/0x7c0
madvise_hwpoison+0x8b/0x160
SyS_madvise+0x40/0x240
? do_page_fault+0x37/0x90
entry_SYSCALL_64_fastpath+0x12/0x71
Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0
RIP split_huge_page_to_list+0xdb/0x120
RSP <ffff8800db16fde8>
---[ end trace aee7ce0df8e44076 ]---
Testcase:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>
#include <string.h>
#define MB 1024*1024
int main(void)
{
char *mem;
posix_memalign((void **)&mem, 2 * MB, 200 * MB);
madvise(mem, 200 * MB, MADV_HWPOISON);
free(mem);
return 0;
}
Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag.
The get_user_pages_fast() which called in madvise_hwpoison() will get
huge zero page if the page is not allocated before. Huge zero page is a
tranparent huge page, however, it is not an anonymous page.
memory_failure will split the huge zero page and trigger
BUG_ON(is_huge_zero_page(page));
After commit 98ed2b0052e6 ("mm/memory-failure: give up error handling
for non-tail-refcounted thp"), memory_failure will not catch non anon
thp from madvise_hwpoison path and this bug occur.
Fix it by catching non anon thp in memory_failure in order to not split
huge zero page in madvise_hwpoison path.
After this patch:
Injecting memory failure for page 0x202800 at 0x7fd8ae800000
MCE: 0x202800: non anonymous thp
[...]
[akpm@linux-foundation.org: remove second split, per Wanpeng]
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After we acquire the sma->sem_perm lock in exit_sem(), we are protected
against a racing IPC_RMID operation. Also at that point, we are the last
user of sem_undo_list. Therefore it isn't required that we acquire or use
ulp->lock.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
set exits
The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).
For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:
Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b kkkkkkkk.kkkkkkk
010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........
Prev obj: start=ffff88003b45c180, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff ...........7....
Next obj: start=ffff88003b45c200, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff ........h).<....
BUG: spinlock wrong CPU on CPU#2, test/18028
general protection fault: 0000 [#1] SMP
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: spin_dump+0x53/0xc0
Call Trace:
spin_bug+0x30/0x40
do_raw_spin_unlock+0x71/0xa0
_raw_spin_unlock+0xe/0x10
freeary+0x82/0x2a0
? _raw_spin_lock+0xe/0x10
semctl_down.clone.0+0xce/0x160
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
SyS_semctl+0x236/0x2c0
? syscall_trace_leave+0xde/0x130
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
RIP [<ffffffff810d6053>] spin_dump+0x53/0xc0
RSP <ffff88003750fd68>
---[ end trace 783ebb76612867a0 ]---
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 3 PID: 18053 Comm: test Tainted: G D 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: native_read_tsc+0x0/0x20
Call Trace:
? delay_tsc+0x40/0x70
__delay+0xf/0x20
do_raw_spin_lock+0x96/0x140
_raw_spin_lock+0xe/0x10
sem_lock_and_putref+0x11/0x70
SYSC_semtimedop+0x7bf/0x960
? handle_mm_fault+0xbf6/0x1880
? dequeue_task_fair+0x79/0x4a0
? __do_page_fault+0x19a/0x430
? kfree_debugcheck+0x16/0x40
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
? do_audit_syscall_entry+0x66/0x70
? syscall_trace_enter_phase1+0x139/0x160
SyS_semtimedop+0xe/0x10
SyS_semop+0x10/0x20
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
Kernel panic - not syncing: softlockup: hung tasks
I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).
The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).
After the patch I'm unable to reproduce the problem using the test case
[1].
[1] Test case used below:
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>
#define NSEM 1
#define NSET 5
int sid[NSET];
void thread()
{
struct sembuf op;
int s;
uid_t pid = getuid();
s = rand() % NSET;
op.sem_num = pid % NSEM;
op.sem_op = 1;
op.sem_flg = SEM_UNDO;
semop(sid[s], &op, 1);
exit(EXIT_SUCCESS);
}
void create_set()
{
int i, j;
pid_t p;
union {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
} un;
/* Create and initialize semaphore set */
for (i = 0; i < NSET; i++) {
sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
if (sid[i] < 0) {
perror("semget");
exit(EXIT_FAILURE);
}
}
un.val = 0;
for (i = 0; i < NSET; i++) {
for (j = 0; j < NSEM; j++) {
if (semctl(sid[i], j, SETVAL, un) < 0)
perror("semctl");
}
}
/* Launch threads that operate on semaphore set */
for (i = 0; i < NSEM * NSET * NSET; i++) {
p = fork();
if (p < 0)
perror("fork");
if (p == 0)
thread();
}
/* Free semaphore set */
for (i = 0; i < NSET; i++) {
if (semctl(sid[i], NSEM, IPC_RMID))
perror("IPC_RMID");
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
int main(int argc, char **argv)
{
pid_t p;
srand(time(NULL));
while (1) {
p = fork();
if (p < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
if (p == 0) {
create_set();
goto end;
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
end:
return 0;
}
[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Hugetlbfs pages will get a refcount in get_any_page() or
madvise_hwpoison() if soft offlining through madvise. The refcount which
is held by the soft offline path should be released if we fail to isolate
hugetlbfs pages.
Fix it by reducing the refcount for both isolation success and failure.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After trying to drain pages from pagevec/pageset, we try to get reference
count of the page again, however, the reference count of the page is not
reduced if the page is still not on LRU list.
Fix it by adding the put_page() to drop the page reference which is from
__get_any_page().
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clock fix from Stephen Boyd:
"A one-liner for a regression found in the PXA clock driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: pxa: pxa3xx: fix CKEN register access
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Clocks 0 to 31 are on CKENA, and not CKENB. The clock register names
were inadequately inverted. As a consequence, all clock operations were
happening on CKENB, because almost all but 2 clocks are on CKENA.
As the clocks were activated by the bootloader in the former tests, it
escaped the testing that the wrong clock gate was manipulated. The error
was revealed by changing the pxa3xx-nand driver to a module, where upon
unloading, the wrong clock was disabled in CKENB.
Fixes: 9bbb8a338fb2 ("clk: pxa: add pxa3xx clock driver")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"A single clocksource driver suspend/resume fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabled
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently the sh_cmt clocksource timer is disabled or enabled
unconditionally on clocksource suspend resp. resume, even if a
better clocksource is present (e.g. arch_sys_counter) and the
sh_cmt clocksource is not enabled.
As sh_cmt is a syscore device when its timer is enabled, this
may lead to a genpd.prepared_count imbalance in the presence of
PM Domains, which may cause a lock-up during reboot after s2ram.
During suspend:
- pm_genpd_prepare() is called for all non-syscore devices (incl.
sh_cmt), increasing genpd.prepared_count for each device,
- clocksource.suspend() is called for all clocksource devices,
- sh_cmt_clocksource_suspend() calls sh_cmt_stop(), which is a no-op
as the clocksource was not enabled.
During resume:
- clocksource.resume() is called for all clocksource devices,
- sh_cmt_clocksource_resume() calls sh_cmt_start(), which enables the
clocksource timer, and turns sh_cmt into a syscore device,
- pm_genpd_complete() is called for all non-syscore devices (excl.
sh_cmt now!), decreasing genpd.prepared_count for each device but
sh_cmt.
Now genpd.prepared_count of the PM Domain containing sh_cmt is
still 1 instead of zero. On subsequent suspend/resume cycles,
sh_cmt is still a syscore device, hence it's skipped for
pm_genpd_{prepare,complete}(), keeping the imbalance of
genpd.prepared_count at 1.
During reboot:
- platform_drv_shutdown() is called for any platform device that has
a driver with a .shutdown() method (only rcar-dmac on R-Car Gen2),
- platform_drv_shutdown() calls dev_pm_domain_detach(), which
calls genpd_dev_pm_detach(),
- genpd_dev_pm_detach() keeps calling pm_genpd_remove_device() until
it doesn't return -EAGAIN[*],
- If the device is part of the same PM Domain as sh_cmt,
pm_genpd_remove_device() always fails with -EAGAIN due to
genpd.prepared_count > 0.
- Infinite loop in genpd_dev_pm_detach()[*].
[*] Commit 93af5e9354432828 ("PM / Domains: Avoid infinite loops in
attach/detach code") already limited the number of loop iterations,
avoiding the lock-up.
To fix this, only disable or enable the clocksource timer on
clocksource suspend resp. resume if the clocksource was enabled.
This was tested on r8a7791/koelsch with the CPG Clock Domain:
- using arch_sys_counter as the clocksource, which is the default, and
which showed the problem,
- using sh_cmt as a clocksource ("echo ffca0000.timer > \
/sys/devices/system/clocksource/clocksource0/current_clocksource"),
which behaves the same as before.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438875126-12596-2-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes: PMU driver corner cases, tooling fixes, and an 'AUX'
(Intel PT) race related core fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler
perf/x86/intel: Fix memory leak on hot-plug allocation fail
perf: Fix PERF_EVENT_IOC_PERIOD migration race
perf: Fix double-free of the AUX buffer
perf: Fix fasync handling on inherited events
perf tools: Fix test build error when bindir contains double slash
perf stat: Fix transaction lenght metrics
perf: Fix running time accounting
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Tony reports that booting his 144-cpu machine with maxcpus=10 triggers
the following WARN_ON():
[ 21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90()
[ 21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1
[ 21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.1506021730 06/02/2015
[ 21.045747] 0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67
[ 21.045748] 0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a
[ 21.045750] ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a
[ 21.045750] Call Trace:
[ 21.045757] [<ffffffff81669b67>] dump_stack+0x45/0x57
[ 21.045759] [<ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0
[ 21.045761] [<ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20
[ 21.045762] [<ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90
[ 21.045764] [<ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160
[ 21.045767] [<ffffffff8109a33d>] notifier_call_chain+0x4d/0x80
[ 21.045769] [<ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10
[ 21.045770] [<ffffffff8107b538>] _cpu_up+0xe8/0x190
[ 21.045771] [<ffffffff8107b65a>] cpu_up+0x7a/0xa0
[ 21.045774] [<ffffffff8165e920>] cpu_subsys_online+0x40/0x90
[ 21.045777] [<ffffffff81433b37>] device_online+0x67/0x90
[ 21.045778] [<ffffffff81433bea>] online_store+0x8a/0xa0
[ 21.045782] [<ffffffff81430e78>] dev_attr_store+0x18/0x30
[ 21.045785] [<ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50
[ 21.045786] [<ffffffff8126ad40>] kernfs_fop_write+0x120/0x170
[ 21.045789] [<ffffffff811f0b77>] __vfs_write+0x37/0x100
[ 21.045791] [<ffffffff811f38b8>] ? __sb_start_write+0x58/0x110
[ 21.045795] [<ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0
[ 21.045796] [<ffffffff811f1279>] vfs_write+0xa9/0x190
[ 21.045797] [<ffffffff811f2075>] SyS_write+0x55/0xc0
[ 21.045800] [<ffffffff81067300>] ? do_page_fault+0x30/0x80
[ 21.045804] [<ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 21.045805] ---[ end trace fe228b836d8af405 ]---
The root cause is that CPU_UP_PREPARE is completely the wrong notifier
action from which to access cpu_data(), because smp_store_cpu_info()
won't have been executed by the target CPU at that point, which in turn
means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been
filled out.
Instead let's invoke our handler from CPU_STARTING and rename it
appropriately.
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We fail to free the shared_regs allocation if the constraint_list
allocation fails.
Cure this and be more consistent in NULL-ing the pointers after free.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
I ran the perf fuzzer, which triggered some WARN()s which are due to
trying to stop/restart an event on the wrong CPU.
Use the normal IPI pattern to ensure we run the code on the correct CPU.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: bad7192b842c ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If rb->aux_refcount is decremented to zero before rb->refcount,
__rb_free_aux() may be called twice resulting in a double free of
rb->aux_pages. Fix this by adding a check to __rb_free_aux().
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 57ffc5ca679f ("perf: Fix AUX buffer refcounting")
Link: http://lkml.kernel.org/r/1437953468.12842.17.camel@decadent.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.
Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.
Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho deMelo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix 'perf stat' transaction length metrics. (Andi Kleen)
- Fix test build error when bindir contains double slash. (Pawel Moll)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
When building with a prefix ending with a slash, for example:
$ make prefix=/usr/local/
one of the perf tests fail to compile due to BUILD_STR macro mishandling
bindir_SQ string containing with two slashes:
-DBINDIR="BUILD_STR(/usr/local//bin)"
with the following error:
CC tests/attr.o
tests/attr.c: In function ‘test__attr’:
tests/attr.c:168:50: error: expected ‘)’ before ‘;’ token
snprintf(path_perf, PATH_MAX, "%s/perf", BINDIR);
^
tests/attr.c:176:1: error: expected ‘;’ before ‘}’ token
}
^
tests/attr.c:176:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
cc1: all warnings being treated as errors
This patch works around the problem by "cleaning" the bindir string
using make's abspath function.
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1438092613-21014-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |/ / / / /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The transaction length metrics in perf stat -T broke recently.
It would not match the metric correctly and always print K/sec.
This was caused by a incorrect update of the cycles_in_tx statistics.
Update the correct variable.
Also the check for zero division was reversed, which resulted in K/sec
being printed for no transactions. Fix this also up.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1438039491-22091-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
A recent fix to the shadow timestamp inadvertly broke the running time
accounting.
We must not update the running timestamp if we fail to schedule the
event, the event will not have ran. This can (and did) result in
negative total runtime because the stopped timestamp was before the
running timestamp (we 'started' but never stopped the event -- because
it never really started we didn't have to stop it either).
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 72f669c0086f ("perf: Update shadow timestamp before add event")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 4.1
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"A single fix for a locking self-test crash"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock: Fix kernel panic in locking-selftest
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Enabling locking-selftest in a VM guest may cause the following
kernel panic:
kernel BUG at .../kernel/locking/qspinlock_paravirt.h:137!
This is due to the fact that the pvqspinlock unlock function is
expecting either a _Q_LOCKED_VAL or _Q_SLOW_VAL in the lock
byte. This patch prevents that bug report by ignoring it when
debug_locks_silent is set. Otherwise, a warning will be printed
if it contains an unexpected value.
With this patch applied, the kernel locking-selftest completed
without any noise.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436663959-53092-1-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \ \ \ \
| |_|_|_|_|/ /
|/| | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull drm fixes from Dave Airlie:
"Back from holidays, found these in the cracks: one nouveau revert, one
vmwgfx locking fix and a bunch of exynos fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them"
drm/vmwgfx: Fix execbuf locking issues
drm/exynos/fimc: fix runtime pm support
drm/exynos/mixer: always update INT_EN cache
drm/exynos/mixer: correct vsync configuration sequence
drm/exynos/mixer: fix interrupt clearing
drm/exynos/hdmi: fix edid memory leak
drm/exynos: gsc: fix wrong bitwise operation for swap detection
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This reverts commit 1addc1264852
This commit seems to cause crashes in gk104_fifo_intr_runlist() by
returning 0xbad0da00 when register 0x2a00 is read. Since this commit was
intended for GM20B which is not completely supported yet, let's revert
it for the time being.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Afzal Mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This addresses two issues that cause problems with viewperf maya-03 in
situation with memory pressure.
The first issue causes attempts to unreserve buffers if batched
reservation fails due to, for example, a signal pending. While previously
the ttm_eu api was resistant against this type of error, it is no longer
and the lockdep code will complain about attempting to unreserve buffers
that are not reserved. The issue is resolved by avoid calling
ttm_eu_backoff_reservation in the buffer reserve error path.
The second issue is that the binding_mutex may be held when user-space
fence objects are created and hence during memory reclaims. This may cause
recursive attempts to grab the binding mutex. The issue is resolved by not
holding the binding mutex across fence creation and submission.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
| |\ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
This pull request fixes memory leak and some issues related to
mixer and gscaler driver issues.
* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos/fimc: fix runtime pm support
drm/exynos/mixer: always update INT_EN cache
drm/exynos/mixer: correct vsync configuration sequence
drm/exynos/mixer: fix interrupt clearing
drm/exynos/hdmi: fix edid memory leak
drm/exynos: gsc: fix wrong bitwise operation for swap detection
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Once pm_runtime_set_active() gets called, the kernel assumes that given
device has already enabled runtime pm and will call pm_runtime_suspend()
without matching pm_runtime_resume(). In case of DRM FIMC IPP driver,
this will result in calling clk_disable() without respective call to
clk_enable(). This patch removes call to pm_runtime_set_active() to
ensure that pm_runtime_suspend/resume calls will match.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
INT_EN cache field was updated only by mixer_enable_vblank.
The patch adds update also by mixer_disable_vblank function.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Specification advises to clear vsync indicator before configuring vsync.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The driver used incorrect flags to clear interrupt status.
The patch fixes it.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
edid returned by drm_get_edid should be freed.
The patch fixes it.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The bits for rotation are not used as exclusively. So GSC_IN_ROT_270 can
not be used for swap detection. The definition of it is same with
GSC_IN_ROT_MASK. It is enough to check GSC_IN_ROT_90 bit is set or not to
check whether width / height size swapping is needed.
Signed-off-by: Hyungwon Hwang <human.hwang@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Pull ARM fixes from Russell King:
"Another few small ARM fixes, mostly addressing some VDSO issues"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8410/1: VDSO: fix coarse clock monotonicity regression
ARM: 8409/1: Mark ret_fast_syscall as a function
ARM: 8408/1: Fix the secondary_startup function in Big Endian case
ARM: 8405/1: VDSO: fix regression with toolchains lacking ld.bfd executable
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the
real timekeeper last") it has become possible on ARM to:
- Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp
via syscall.
- Subsequently obtain a timestamp for the same clock ID via VDSO which
predates the first timestamp (by one jiffy).
This is because ARM's update_vsyscall is deriving the coarse time
using the __current_kernel_time interface, when it should really be
using the timekeeper object provided to it by the timekeeping core.
It happened to work before only because __current_kernel_time would
access the same timekeeper object which had been passed to
update_vsyscall. This is no longer the case.
Cc: stable@vger.kernel.org
Fixes: 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the real timekeeper last")
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
ret_fast_syscall runs when user space makes a syscall. However it
needs to be marked as such so the ELF information is correct. Before
it was:
101: 8000f300 0 NOTYPE LOCAL DEFAULT 2 ret_fast_syscall
But with this change it correctly shows as:
101: 8000f300 96 FUNC LOCAL DEFAULT 2 ret_fast_syscall
I see this function when using perf to unwind call stacks from kernel
space to user space. Without this change I would need to add some
special case logic when using the vmlinux ELF information.
Signed-off-by: Drew Richardson <drew.richardson@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|