summaryrefslogtreecommitdiffstats
path: root/Documentation/assoc_array.txt (unfollow)
Commit message (Collapse)AuthorFilesLines
2016-10-08x86/pkeys: Make protection keys an "eager" featureDave Hansen1-3/+4
Our XSAVE features are divided into two categories: those that generate FPU exceptions, and those that do not. MPX and pkeys do not generate FPU exceptions and thus can not be used lazily. We disable them when lazy mode is forced on. We have a pair of masks to collect these two sets of features, but XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY. Fix it by moving the feature to XFEATURE_MASK_EAGER. Note: this only causes problem if you boot with lazy FPU mode (eagerfpu=off) which is *not* the default. It also only affects hardware which is not currently publicly available. It looks like eager mode is going away, but we still need this patch applied to any kernel that has protection keys and lazy mode, which is 4.6 through 4.8 at this point, and 4.9 if the lazy removal isn't sent to Linus for 4.9. Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures") Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-08x86/apic: Prevent pointless warning messagesThomas Gleixner1-3/+5
Markus reported that he sees new warnings: APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 4/0x84 ignored. APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 5/0x85 ignored. This comes from the recent persistant cpuid - nodeid changes. The code which emits the warning has been called prior to these changes only for enabled processors. Now it's called for disabled processors as well to get the possible cpu accounting correct. So if the kernel is compiled for the number of actual available/enabled CPUs and the BIOS reports disabled CPUs as well then the above warnings are printed. That's a pointless exercise as it only makes sense if there are more CPUs enabled than the kernel supports. Nake the warning conditional on enabled processors so we are back to the state before these changes. Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: linux-acpi@vger.kernel.org Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-08x86/acpi: Prevent LAPIC id 0xff from being accountedThomas Gleixner1-0/+4
Yinghai reported that the recent changes to make the cpuid - nodeid relationship permanent causes a cpuid ordering regression on a system which has 2apic enabled.. The reason is that the ACPI local APIC parser has no sanity check for apicid 0xff, which is an invalid id. So a CPU id for this invalid local APIC id is allocated and therefor breaks the cpuid ordering. Add a sanity check to acpi_parse_lapic() which ignores the invalid id. Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>, Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: douly.fnst@cn.fujitsu.com, Cc: zhugh.fnst@cn.fujitsu.com Cc: Tony Luck <tony.luck@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Lv Zheng <lv.zheng@intel.com>, Cc: robert.moore@intel.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/CAE9FiQVQx6FRXT-RdR7Crz4dg5LeUWHcUSy1KacjR+JgU_vGJg@mail.gmail.com
2016-10-07arch/x86: Handle non enumerated CPU after physical hotplugPrarit Bhargava1-3/+15
When a CPU is physically added to a system then the MADT table is not updated. If subsequently a kdump kernel is started on that physically added CPU then the ACPI enumeration fails to provide the information for this CPU which is now the boot CPU of the kdump kernel. As a consequence, generic_processor_info() is not invoked for that CPU so the number of enumerated processors is 0 and none of the initializations, including the logical package id management, are performed. We have code which relies on the correctness of the logical package map and other information which is initialized via generic_processor_info(). Executing such code will result in undefined behaviour or kernel crashes. This problem applies only to the kdump kernel because a normal kexec will switch to the original boot CPU, which is enumerated in MADT, before jumping into the kexec kernel. The boot code already has a check for num_processors equal 0 in prefill_possible_map(). We can use that check as an indicator that the enumeration of the boot CPU did not happen and invoke generic_processor_info() for it. That initializes the relevant data for the boot CPU and therefore prevents subsequent failure. [ tglx: Refined the code and rewrote the changelog ] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id") Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: dyoung@redhat.com Cc: Eric Biederman <ebiederm@xmission.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-06x86/unwind: Fix oprofile module link errorJosh Poimboeuf2-12/+12
When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n, the oprofile module fails to link: ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined! The problem was introduced when oprofile was converted to use the new x86 unwinder. When frame pointers are disabled, the "guess" unwinder's unwind_get_return_address() is an inline function which calls ftrace_graph_ret_addr(), which is not exported. Fix it by converting the "guess" version of unwind_get_return_address() to an exported out-of-line function, just like its frame pointer counterpart. Reported-by: Karl Beldan <karl.beldan@gmail.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: ec2ad9ccf12d ("oprofile/x86: Convert x86_backtrace() to use the new unwinder") Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-05x86/vmware: Skip lapic calibration on VMwareRenat Valiullin1-2/+10
In a virtualized environment the APIC timer calibration can go wrong when the host is overcommitted or the guest is running nested. This results in the APIC timers operating at an incorrect frequency. Since VMware supports a mechanism to retrieve the local APIC frequency we can ask the hypervisor for it and skip the APIC calibration loop. Signed-off-by: Renat Valiullin <rvaliullin@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161004201148.GA1421@uu64vm Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-05x86/syscalls: Remove bash-isms in syscall table generatorsylvain.bertrand@gmail.com1-6/+9
Signed-off-by: Sylvain BERTRAND <sylvain.bertrand@gmail.com> Link: http://lkml.kernel.org/r/20160929162234.GA29592@freedom Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-04x86/irq: Prevent force migration of irqs which are not in the vector domainMika Westerberg1-3/+20
When a CPU is about to be offlined we call fixup_irqs() that resets IRQ affinities related to the CPU in question. The same thing is also done when the system is suspended to S-states like S3 (mem). For each IRQ we try to complete any on-going move regardless whether the IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch its chip_data, assume it is of type struct apic_chip_data and manipulate it by clearing old_domain mask etc. For irq_chips that are not part of the x86_vector_domain, like those created by various GPIO drivers, will find their chip_data being changed unexpectly. Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets corrupted after resume: # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( |sysfs ) in hi # rtcwake -s10 -mmem <10 seconds passes> # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( |sysfs ) in ? Note '?' in the output. It means the struct gpio_chip ->get function is NULL whereas before suspend it was there. Fix this by first checking that the IRQ belongs to x86_vector_domain before we try to use the chip_data as struct apic_chip_data. Reported-and-tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org # 4.4+ Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-03Linux 4.8v4.8Linus Torvalds1-1/+1
2016-10-02ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7Srinivas Ramana1-1/+1
If the bootloader uses the long descriptor format and jumps to kernel decompressor code, TTBCR may not be in a right state. Before enabling the MMU, it is required to clear the TTBCR.PD0 field to use TTBR0 for translation table walks. The commit dbece45894d3a ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but doesn't consider all the bits for the size of TTBCR.N. Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to indicate the use of TTBR0 and the correct base address width. Fixes: dbece45894d3 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores") Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Srinivas Ramana <sramana@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-10-02MIPS: CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systemsPaul Burton1-0/+11
When discovering the number of VPEs per core, smp_num_siblings will be incorrect for kernels built without support for the MIPS MultiThreading (MT) ASE running on systems which implement said ASE. This leads to accesses to VPEs in secondary cores being performed incorrectly since mips_cm_vp_id calculates the wrong ID to write to the local "other" registers. Fix this by examining the number of VPEs in the core as reported by the CM. This patch presumes that the number of VPEs will be the same in each core of the system. As this path only applies to systems with CM version 2.5 or lower, and this property is true of all such known systems, this is likely to be fine but is described in a comment for good measure. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14338/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-10-01include/linux/property.h: fix typo/compile errorJohn Youn1-1/+1
This fixes commit d76eebfa175e ("include/linux/property.h: fix build issues with gcc-4.4.4"). With that commit we get the following compile error when using the PROPERTY_ENTRY_INTEGER_ARRAY macro. include/linux/property.h:201:39: error: `u32_data' undeclared (first use in this function) PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u32, _val_) ^ include/linux/property.h:193:17: note: in definition of macro `PROPERTY_ENTRY_INTEGER_ARRAY' { .pointer = { _type_##_data = _val_ } }, \ ^ This needs a '.' to reference the union member. It seems this was just overlooked here since it is done correctly in similar constructs in other parts of the original commit. This fix is in preparation of upcoming commits that will use this macro. Fixes: commit d76eebfa175e ("include/linux/property.h: fix build issues with gcc-4.4.4") Link: http://lkml.kernel.org/r/2de3b929290d88a723ed829a3e3cbd02044714df.1475114627.git.johnyoun@synopsys.com Signed-off-by: John Youn <johnyoun@synopsys.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-01ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()Eric Ren1-0/+10
The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally. In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; there are 2 process repeatedly performing the following operations respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then ftruncate(fd, CLUSTER_SIZE) again and again. This is the backtrace when the deadlock happens: __wait_on_bit_lock+0x50/0xa0 __lock_page+0xb7/0xc0 ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2] do_page_mkwrite+0x66/0xc0 handle_mm_fault+0x685/0x1350 __do_page_fault+0x1d8/0x4d0 trace_do_page_fault+0x37/0xf0 do_async_page_fault+0x19/0x70 async_page_fault+0x28/0x30 In ocfs2_write_begin_nolock(), we first grab the pages and then allocate disk space for this write; ocfs2_try_to_free_truncate_log() will be called if -ENOSPC is returned; if we're lucky to get enough clusters, which is usually the case, we start over again. But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we will deadlock when trying to grab the target page again. Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write(). Another deadlock will happen in __do_page_mkwrite() if ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a locked target page. These two errors fail on the same path, so fix them by unlocking the target page manually before ocfs2_free_write_ctxt(). Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Changes since v1: 1. Also put ENOMEM error case into consideration. Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: He Gang <ghe@suse.com> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-01mm: workingset: fix crash in shadow node shrinker caused by ↵Johannes Weiner3-63/+63
replace_page_cache_page() Antonio reports the following crash when using fuse under memory pressure: kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346! invalid opcode: 0000 [#1] SMP Modules linked in: all of them CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000 RIP: shadow_lru_isolate+0x181/0x190 Call Trace: __list_lru_walk_one.isra.3+0x8f/0x130 list_lru_walk_one+0x23/0x30 scan_shadow_nodes+0x34/0x50 shrink_slab.part.40+0x1ed/0x3d0 shrink_zone+0x2ca/0x2e0 kswapd+0x51e/0x990 kthread+0xd8/0xf0 ret_from_fork+0x3f/0x70 which corresponds to the following sanity check in the shadow node tracking: BUG_ON(node->count & RADIX_TREE_COUNT_MASK); The workingset code tracks radix tree nodes that exclusively contain shadow entries of evicted pages in them, and this (somewhat obscure) line checks whether there are real pages left that would interfere with reclaim of the radix tree node under memory pressure. While discussing ways how fuse might sneak pages into the radix tree past the workingset code, Miklos pointed to replace_page_cache_page(), and indeed there is a problem there: it properly accounts for the old page being removed - __delete_from_page_cache() does that - but then does a raw raw radix_tree_insert(), not accounting for the replacement page. Eventually the page count bits in node->count underflow while leaving the node incorrectly linked to the shadow node LRU. To address this, make sure replace_page_cache_page() uses the tracked page insertion code, page_cache_tree_insert(). This fixes the page accounting and makes sure page-containing nodes are properly unlinked from the shadow node LRU again. Also, make the sanity checks a bit less obscure by using the helpers for checking the number of pages and shadows in a radix tree node. Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Antonio SJ Musumeci <trapexit@spawn.link> Debugged-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30ARC: [plat*] enables MODULE*Vineet Gupta5-1/+16
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: fix local_save_flagsVineet Gupta1-1/+1
Commit d9676fa152c83b ("ARCv2: Enable LOCKDEP"), changed local_save_flags() to not return raw STATUS32 but encoded in the form such that it could be fed directly to CLRI/SETI instructions. However the STATUS32.E[] was not captured correctly as it corresponds to bits [4:1] in the register and not [3:0] Fixes: d9676fa152c83b ("ARCv2: Enable LOCKDEP") Cc: Evgeny Voevodin <evgeny.voevodin@intel.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: CONFIG_NODES_SHIFT fix default valuesNoam Camus1-2/+2
Seem like values assigned as absolute number and not and shift value, i.e. should be 0 for one node (2^0) and 1 for couple of nodes (2^1) Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: intc: Use kflag if STATUS32.IE must be resetYuriy Kolerov1-1/+1
In the end of "arc_init_IRQ" STATUS32.IE flag is going to be affected by "flag" instruction but "flag" never touches IE flag on ARCv2. So "kflag" instruction must be used instead of "flag". Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Cc: stable@vger.kernel.org #4.2+ Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: .exit.* sections can be discarded in .eh_frame regimeVineet Gupta1-8/+0
We used to keep the .exit.* sections as linker would fail in final link due to references from .debug_frame which itself could not be discardrd due to the forced "write,alloc" attributes for it. | LD init/built-in.o | `.exit.text' referenced in section `.debug_frame' of arch/arc/built-in.o: defined in discarded section `.exit.text' of arch/arc/built-in.o | Makefile:949: recipe for target 'vmlinux' failed With .debug_frame now retired, this hack is no longer needed. kernel binary is now a little bit smaller as well. closes STAR 9000549913 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: dw2 unwind: enable cfi pseudo ops in string libVineet Gupta11-25/+26
This uses a new set of annoations viz. ENTRY_CFI/END_CFI to enabel cfi ops generation. Note that we didn't change the normal ENTRY/EXIT as we don't actually want unwind info in the trap/exception/interrutp handlers which use these, as unwinder then gets confused (it keeps recursing vs. stopping). Semantically these are leaf routines and unwinding should stop when it hits those routines. Before ------ 28.52% 1.19% 9929 hackbench libuClibc-1.0.17.so [.] __write_nocancel | ---__write_nocancel |--8.95%--EV_Trap | --8.25%--sys_write | |--3.93%--sock_write_iter ... |--2.62%--memset <==== [LEAF entry as no unwind info] ^^^^^^ After ----- 29.46% 1.24% 13622 hackbench libuClibc-1.0.17.so [.] __write_nocancel | ---__write_nocancel |--9.31%--EV_Trap | --8.62%--sys_write | |--4.17%--sock_write_iter ... |--6.19%--sys_write | --6.19%--sock_write_iter | unix_stream_sendmsg | |--1.62%--sock_alloc_send_pskb | |--0.89%--sock_def_readable | |--0.88%--_raw_spin_unlock_irqrestore | |--0.69%--memset | | ^^^^^^ <==== [now in proper callframe] | | | --0.52%--skb_copy_datagram_from_iter Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: dw2 unwind: add infrastructure for adding cfi pseudo ops to asmVineet Gupta3-1/+52
1. detect whether binutils supports the cfi pseudo ops 2. define conditional macros to generate the ops 3. define new ENTRY_CFI/END_CFI to annotate hand asm code. - Needed because we don't want to emit dwarf info in general ENTRY/END used by lowest level trap/exception/interrutp handlers as unwinder gets confused trying to unwind out of them. We want unwinder to instead stop when it hits onfo those routines - These provide minimal start/end cfi ops assuming routine doesn't touch stack memory/regs Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: entry: make ret_from_system_call local labelVineet Gupta1-7/+5
This essentially removes ENTRY() assembler annotation for this symbol since it didn't have a pairing END() This in ahead of introducing cfi pseudo ops in ENTRY/END which expects paired cfi_startproc/cfi_endproc | ../arch/arc/kernel/entry.S: Assembler messages: | ../arch/arc/kernel/entry.S:270: Error: previous CFI entry not closed (missing .cfi_endproc) | ../scripts/Makefile.build:326: recipe for target 'arch/arc/kernel/entry-arcv2.o' failed | make[4]: *** [arch/arc/kernel/entry-arcv2.o] Error 1 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: dw2 unwind: don't force dwarf 2Vineet Gupta1-6/+0
In .debug_frame based unwinding regime, we used to force -gdwarf-2 since kernel unwinder only claimed to handle dwarf 2. This changed since commit 6d0d506012c93d ("ARC: dw2 unwind: Don't bail for CIE.version != 1") which added some support beyond dwarf 2, atleast to handle CIE != 1 The ill-effect of -gdwarf-2 is that it forces generation of .debug_* sections, which bloats loadable modules .ko files. For the curious, this doesn't affect vmlinx binary since linker script discards .debug_* but same discard is not yet implemented for modules. So it seems we can drop the -gdwarf-2 toggle, which should not be needed anyways given that we now use .eh_frame based unwinding. I've verified using GNU 2016.09-engo10 that the actual unwind info is not different with or w/o this toggle - but the debug_* sections are gone for good. before ----- arc-linux-readelf -S q_proc.ko-unwinding-1-eh_frame-switch | grep debug [15] .debug_info PROGBITS 00000000 000300 00d08d 00 0 0 1 [16] .rela.debug_info RELA 00000000 0162a0 008844 0c I 29 15 4 [17] .debug_abbrev PROGBITS 00000000 00d38d 0005f8 00 0 0 1 [18] .debug_loc PROGBITS 00000000 00d985 000070 00 0 0 1 [19] .rela.debug_loc RELA 00000000 01eae4 0000c0 0c I 29 18 4 [20] .debug_aranges PROGBITS 00000000 00d9f5 000040 00 0 0 1 [21] .rela.debug_arang RELA 00000000 01eba4 000030 0c I 29 20 4 [22] .debug_ranges PROGBITS 00000000 00da35 000018 00 0 0 1 [23] .rela.debug_range RELA 00000000 01ebd4 000030 0c I 29 22 4 [24] .debug_line PROGBITS 00000000 00da4d 000b5b 00 0 0 1 [25] .rela.debug_line RELA 00000000 01ec04 0000cc 0c I 29 24 4 [26] .debug_str PROGBITS 00000000 00e5a8 007831 01 MS 0 0 1 after ---- Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: dw2 unwind: switch to .eh_frame based unwindingVineet Gupta5-34/+13
So finally after almost 8 years of dealing with .debug_frame, we are finally switching to .eh_frame. The reason being stripped kernel binaries had non-functional unwinder as .debug_frame was gone. Also, in general .eh_frame seems more common way of doing unwinding. This also folds a revert of f52e126cc747 ("ARC: unwind: ensure that .debug_frame is generated (vs. .eh_frame)") to ensure that we start getting .eh_frame Reported-by: Daniel Mentz <danielmentz@google.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: dw2 unwind: factor CIE specifics for .eh_frame/.debug_frameVineet Gupta1-7/+18
This paves way for switching to .eh_frame based unwindiing Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: module: support R_ARC_32_PCREL relocationVineet Gupta2-4/+5
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30arc: perf: Enable generic "cache-references" and "cache-misses" eventsAlexey Brodkin2-2/+7
We used to live with PERF_COUNT_HW_CACHE_REFERENCES and PERF_COUNT_HW_CACHE_REFERENCES not specified on ARC. Those events are actually aliases to 2 cache events that we do support and so this change sets "cache-reference" and "cache-misses" events in the same way as "L1-dcache-loads" and L1-dcache-load-misses. And while at it adding debug info for cache events as well as doing a subtle fix in HW events debug info - config value is much better represented by hex so we may see not only event index but as well other control bits set (if they exist). Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: [plat-eznps] add missing atomic_fetch_xxx operationsNoam Camus1-0/+2
Build brekeage since last changes to generic atomic operations. Added couple of missing macros which are now mandatory Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: Implement atomic64 based on LLOCKD/SCONDD instructionsVineet Gupta2-3/+260
ARCv2 ISA provides 64-bit exclusive load/stores so use them to implement the 64-bit atomics and elide the spinlock based generic 64-bit atomics boot tested with atomic64 self-test (and GOD bless the person who wrote them, I realized my inline assmebly is sloppy as hell) Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: Support dynamic peripheral address space in HS38 rel 3.0 coresVineet Gupta5-18/+23
HS release 3.0 provides for even more flexibility in specifying the volatile address space for mapping peripherals. With HS 2.1 @start was made flexible / programmable - with HS 3.0 even @end can be setup (vs. fixed to 0xFFFF_FFFF before). So add code to reflect that and while at it remove an unused struct defintion Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: identify HS38 rel 3.0 coresVineet Gupta1-0/+1
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: Add support for ZeBu Emulation platform for HS coresVineet Gupta5-0/+330
The cool thing is that same kernel image can run on - nsim OSCI simulation platform - SDPlite FPGA setups Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30arc: Add "model" properly in device tree description of all boardsAlexey Brodkin13-0/+13
As it was discussed quite some time ago (see https://lkml.org/lkml/2015/11/5/862) it's a good practice to add "model" property in .dts. Moreover as per ePAPR "model" property is required and should look like "manufacturer,model" so we do here. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Jonas Gorski <jonas.gorski@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robh@kernel.org> Cc: Christian Ruppert <christian.ruppert@alitech.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30MAINTAINERS: Switch to kernel.org email address for Javi MerinoJavi Merino2-1/+2
Change my email address to my kernel.org account instead of the ARM one. Signed-off-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30x86/entry/64: Fix context tracking state warning when load_gs_index failsWanpeng Li1-2/+2
This warning: WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50 CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 enter_from_user_mode+0x32/0x50 error_entry+0x6d/0xc0 ? general_protection+0x12/0x30 ? native_load_gs_index+0xd/0x20 ? do_set_thread_area+0x19c/0x1f0 SyS_set_thread_area+0x24/0x30 do_int80_syscall_32+0x7c/0x220 entry_INT80_compat+0x38/0x50 ... can be reproduced by running the GS testcase of the ldt_gdt test unit in the x86 selftests. do_int80_syscall_32() will call enter_form_user_mode() to convert context tracking state from user state to kernel state. The load_gs_index() call can fail with user gsbase, gsbase will be fixed up and proceed if this happen. However, enter_from_user_mode() will be called again in the fixed up path though it is context tracking kernel state currently. This patch fixes it by just fixing up gsbase and telling lockdep that IRQs are off once load_gs_index() failed with user gsbase. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUIDAndy Lutomirski1-12/+11
Otherwise arch_task_struct_size == 0 and we die. While we're at it, set X86_FEATURE_ALWAYS, too. Reported-by: David Saggiorato <david@saggiorato.net> Tested-by: David Saggiorato <david@saggiorato.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86") Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86/asm: Get rid of __read_cr4_safe()Andy Lutomirski9-26/+11
We use __read_cr4() vs __read_cr4_safe() inconsistently. On CR4-less CPUs, all CR4 bits are effectively clear, so we can make the code simpler and more robust by making __read_cr4() always fix up faults on 32-bit kernels. This may fix some bugs on old 486-like CPUs, but I don't have any easy way to test that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: david@saggiorato.net Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30x86/vdso: Fix building on big endian hostSegher Boessenkool1-1/+1
We need to call GET_LE to read hdr->e_type. Fixes: 57f90c3dfc75 ("x86/vdso: Error out if the vDSO isn't a valid DSO") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-next@vger.kernel.org Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30x86/boot: Fix another __read_cr4() case on 486Andy Lutomirski1-3/+1
The condition for reading CR4 was wrong: there are some CPUs with CPUID but not CR4. Rather than trying to make the condition exact, use __read_cr4_safe(). Fixes: 18bc7bd523e0 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly") Reported-by: david@saggiorato.net Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30sched/irqtime: Consolidate irqtime flushing codeFrederic Weisbecker1-15/+11
The code performing irqtime nsecs stats flushing to kcpustat is roughly the same for hardirq and softirq. So lets consolidate that common code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-6-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/irqtime: Consolidate accounting synchronization with u64_stats APIFrederic Weisbecker2-55/+29
The irqtime accounting currently implement its own ad hoc implementation of u64_stats API. Lets rather consolidate it with the appropriate library. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30u64_stats: Introduce IRQs disabled helpersFrederic Weisbecker1-21/+24
Introduce light versions of u64_stats helpers for context where either preempt or IRQs are disabled. This way we can make this library usable by scheduler irqtime accounting which currenty implement its ad-hoc version. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/irqtime: Remove needless IRQs disablement on kcpustat updateFrederic Weisbecker1-6/+5
The callers of the functions performing irqtime kcpustat updates have IRQS disabled, no need to disable them again. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/irqtime: No need for preempt-safe accessorsFrederic Weisbecker1-2/+2
We can safely use the preempt-unsafe accessors for irqtime when we flush its counters to kcpustat as IRQs are disabled at this time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/fair: Fix min_vruntime trackingPeter Zijlstra1-7/+22
While going through enqueue/dequeue to review the movement of set_curr_task() I noticed that the (2nd) update_min_vruntime() call in dequeue_entity() is suspect. It turns out, its actually wrong because it will consider cfs_rq->curr, which could be the entry we just normalized. This mixes different vruntime forms and leads to fail. The purpose of the second update_min_vruntime() is to move min_vruntime forward if the entity we just removed is the one that was holding it back; _except_ for the DEQUEUE_SAVE case, because then we know its a temporary removal and it will come back. However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr will still be set (and per the above, can be tranformed into a different unit), so update_min_vruntime() should also consider curr->on_rq. This also fixes another corner case where the enqueue (which also does update_curr()->update_min_vruntime()) happens on the rq->lock break in schedule(), between dequeue and put_prev_task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 1e876231785d ("sched: Fix ->min_vruntime calculation in dequeue_entity()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/debug: Add SCHED_WARN_ON()Peter Zijlstra2-6/+10
Provide SCHED_WARN_ON as wrapper for WARN_ON_ONCE() to avoid CONFIG_SCHED_DEBUG wrappery. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Fix set_user_nice()Peter Zijlstra1-1/+7
Almost all scheduler functions update state with the following pattern: if (queued) dequeue_task(rq, p, DEQUEUE_SAVE); if (running) put_prev_task(rq, p); /* update state */ if (queued) enqueue_task(rq, p, ENQUEUE_RESTORE); if (running) set_curr_task(rq, p); set_user_nice() however misses the running part, cure this. This was found by asserting we never enqueue 'current'. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/fair: Introduce set_curr_task() helperPeter Zijlstra2-5/+10
Now that the ia64 only set_curr_task() symbol is gone, provide a helper just like put_prev_task(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core, ia64: Rename set_curr_task()Peter Zijlstra3-7/+7
Rename the ia64 only set_curr_task() function to free up the name. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Fix incorrect utilization accounting when switching to fair classVincent Guittot1-10/+10
When a task switches to fair scheduling class, the period between now and the last update of its utilization is accounted as running time whatever happened during this period. This incorrect accounting applies to the task and also to the task group branch. When changing the property of a running task like its list of allowed CPUs or its scheduling class, we follow the sequence: - dequeue task - put task - change the property - set task as current task - enqueue task The end of the sequence doesn't follow the normal sequence (as per __schedule()) which is: - enqueue a task - then set the task as current task. This incorrectordering is the root cause of incorrect utilization accounting. Update the sequence to follow the right one: - dequeue task - put task - change the property - enqueue task - set task as current task Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: linaro-kernel@lists.linaro.org Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1473666472-13749-8-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>