diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-31 06:01:36 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-31 06:01:36 +0200 |
commit | bad60e6f259a01cf9f29a1ef8d435ab6c60b2de9 (patch) | |
tree | c9aaa8166735659761239c117af2b11b022bc6cb /arch/powerpc/kernel | |
parent | random: Fix crashes with sparse node ids (diff) | |
parent | Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwoo... (diff) | |
download | linux-bad60e6f259a01cf9f29a1ef8d435ab6c60b2de9.tar.xz linux-bad60e6f259a01cf9f29a1ef8d435ab6c60b2de9.zip |
Merge tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- PowerNV PCI hotplug support.
- Lots more Power9 support.
- eBPF JIT support on ppc64le.
- Lots of cxl updates.
- Boot code consolidation.
Bug fixes:
- Fix spin_unlock_wait() from Boqun Feng
- Fix stack pointer corruption in __tm_recheckpoint() from Michael
Neuling
- Fix multiple bugs in memory_hotplug_max() from Bharata B Rao
- mm: Ensure "special" zones are empty from Oliver O'Halloran
- ftrace: Separate the heuristics for checking call sites from
Michael Ellerman
- modules: Never restore r2 for a mprofile-kernel style mcount() call
from Michael Ellerman
- Fix endianness when reading TCEs from Alexey Kardashevskiy
- start rtasd before PCI probing from Greg Kurz
- PCI: rpaphp: Fix slot registration for multiple slots under a PHB
from Tyrel Datwyler
- powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev
Bhattiprolu
Cleanups & fixes:
- Drop support for MPIC in pseries from Rashmica Gupta
- Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman
- Remove unused symbols in asm-offsets.c from Rashmica Gupta
- Fix SRIOV not building without EEH enabled from Russell Currey
- Remove kretprobe_trampoline_holder from Thiago Jung Bauermann
- Reduce log level of PCI I/O space warning from Benjamin
Herrenschmidt
- Add array bounds checking to crash_shutdown_handlers from Suraj
Jitindar Singh
- Avoid -maltivec when using clang integrated assembler from Anton
Blanchard
- Fix array overrun in ppc_rtas() syscall from Andrew Donnellan
- Fix error return value in cmm_mem_going_offline() from Rasmus
Villemoes
- export cpu_to_core_id() from Mauricio Faria de Oliveira
- Remove old symbols from defconfigs from Andrew Donnellan
- Update obsolete comments in setup_32.c about entry conditions from
Benjamin Herrenschmidt
- Add comment explaining the purpose of setup_kdump_trampoline() from
Benjamin Herrenschmidt
- Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin
Hao
- Remove RELOCATABLE_PPC32 from Kevin Hao
- Fix .long's in tlb-radix.c to more meaningful from Balbir Singh
Minor cleanups & fixes:
- Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin
Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King,
Geliang Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman,
Michael Ellerman, Stephen Rothwell, Stewart Smith.
Freescale updates from Scott:
- "Highlights include more 8xx optimizations, device tree updates,
and MVME7100 support."
PowerNV PCI hotplug from Gavin Shan:
- PCI: Add pcibios_setup_bridge()
- Override pcibios_setup_bridge()
- Remove PCI_RESET_DELAY_US
- Move pnv_pci_ioda_setup_opal_tce_kill() around
- Increase PE# capacity
- Allocate PE# in reverse order
- Create PEs in pcibios_setup_bridge()
- Setup PE for root bus
- Extend PCI bridge resources
- Make pnv_ioda_deconfigure_pe() visible
- Dynamically release PE
- Update bridge windows on PCI plug
- Delay populating pdn
- Support PCI slot ID
- Use PCI slot reset infrastructure
- Introduce pnv_pci_get_slot_id()
- Functions to get/set PCI slot state
- PCI/hotplug: PowerPC PowerNV PCI hotplug driver
- Print correct PHB type names
Power9 idle support from Shreyas B. Prabhu:
- set power_save func after the idle states are initialized
- Use PNV_THREAD_WINKLE macro while requesting for winkle
- make hypervisor state restore a function
- Rename idle_power7.S to idle_book3s.S
- Rename reusable idle functions to hardware agnostic names
- Make pnv_powersave_common more generic
- abstraction for saving SPRs before entering deep idle states
- Add platform support for stop instruction
- cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
- cpuidle/powernv: cleanup cpuidle-powernv.c
- cpuidle/powernv: Add support for POWER ISA v3 idle states
- Use deepest stop state when cpu is offlined
Power9 PMU from Madhavan Srinivasan:
- factor out power8 pmu macros and defines
- factor out power8 pmu functions
- factor out power8 __init_pmu code
- Add power9 event list macros for generic and cache events
- Power9 PMU support
- Export Power9 generic and cache events to sysfs
Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt:
- Add XICS emulation APIs
- Move a few exception common handlers to make room
- Add support for HV virtualization interrupts
- Add mechanism to force a replay of interrupts
- Add ICP OPAL backend
- Discover IODA3 PHBs
- pci: Remove obsolete SW invalidate
- opal: Add real mode call wrappers
- Rename TCE invalidation calls
- Remove SWINV constants and obsolete TCE code
- Rework accessing the TCE invalidate register
- Fallback to OPAL for TCE invalidations
- Use the device-tree to get available range of M64's
- Check status of a PHB before using it
- pci: Don't try to allocate resources that will be reassigned
Other Power9:
- Send SIGBUS on unaligned copy and paste from Chris Smart
- Large Decrementer support from Oliver O'Halloran
- Load Monitor Register Support from Jack Miller
Performance improvements from Anton Blanchard:
- Avoid load hit store in __giveup_fpu() and __giveup_altivec()
- Avoid load hit store in setup_sigcontext()
- Remove assembly versions of strcpy, strcat, strlen and strcmp
- Align hot loops of some string functions
eBPF JIT from Naveen N. Rao:
- Fix/enhance 32-bit Load Immediate implementation
- Optimize 64-bit Immediate loads
- Introduce rotate immediate instructions
- A few cleanups
- Isolate classic BPF JIT specifics into a separate header
- Implement JIT compiler for extended BPF
Operator Panel driver from Suraj Jitindar Singh:
- devicetree/bindings: Add binding for operator panel on FSP machines
- Add inline function to get rc from an ASYNC_COMP opal_msg
- Add driver for operator panel on FSP machines
Sparse fixes from Daniel Axtens:
- make some things static
- Introduce asm-prototypes.h
- Include headers containing prototypes
- Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
- kvm: Clarify __user annotations
- Pass endianness to sparse
- Make ppc_md.{halt, restart} __noreturn
MM fixes & cleanups from Aneesh Kumar K.V:
- radix: Update LPCR HR bit as per ISA
- use _raw variant of page table accessors
- Compile out radix related functions if RADIX_MMU is disabled
- Clear top 16 bits of va only on older cpus
- Print formation regarding the the MMU mode
- hash: Update SDR1 size encoding as documented in ISA 3.0
- radix: Update PID switch sequence
- radix: Update machine call back to support new HCALL.
- radix: Add LPID based tlb flush helpers
- radix: Add a kernel command line to disable radix
- Cleanup LPCR defines
Boot code consolidation from Benjamin Herrenschmidt:
- Move epapr_paravirt_early_init() to early_init_devtree()
- cell: Don't use flat device-tree after boot
- ge_imp3a: Don't use the flat device-tree after boot
- mpc85xx_ds: Don't use the flat device-tree after boot
- mpc85xx_rdb: Don't use the flat device-tree after boot
- Don't test for machine type in rtas_initialize()
- Don't test for machine type in smp_setup_cpu_maps()
- dt: Add of_device_compatible_match()
- Factor do_feature_fixup calls
- Move 64-bit feature fixup earlier
- Move 64-bit memory reserves to setup_arch()
- Use a cachable DART
- Move FW feature probing out of pseries probe()
- Put exception configuration in a common place
- Remove early allocation of the SMU command buffer
- Move MMU backend selection out of platform code
- pasemi: Remove IOBMAP allocation from platform probe()
- mm/hash: Don't use machine_is() early during boot
- Don't test for machine type to detect HEA special case
- pmac: Remove spurrious machine type test
- Move hash table ops to a separate structure
- Ensure that ppc_md is empty before probing for machine type
- Move 64-bit probe_machine() to later in the boot process
- Move 32-bit probe() machine to later in the boot process
- Get rid of ppc_md.init_early()
- Move the boot time info banner to a separate function
- Move setting of {i,d}cache_bsize to initialize_cache_info()
- Move the content of setup_system() to setup_arch()
- Move cache info inits to a separate function
- Re-order the call to smp_setup_cpu_maps()
- Re-order setup_panic()
- Make a few boot functions __init
- Merge 32-bit and 64-bit setup_arch()
Other new features:
- tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas
- tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas
- powerpc: Add module autoloading based on CPU features from Alastair D'Silva
- crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva
- Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt
- xmon: Dump ISA 2.06 SPRs from Michael Ellerman
- xmon: Dump ISA 2.07 SPRs from Michael Ellerman
- Add a parameter to disable 1TB segs from Oliver O'Halloran
- powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran
- Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli
- pseries: Add pseries hotplug workqueue from John Allen
- pseries: Add support for hotplug interrupt source from John Allen
- pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen
- pseries: Move property cloning into its own routine from Nathan Fontenot
- pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot
- pseries: Auto-online hotplugged memory from Nathan Fontenot
- pseries: Remove call to memblock_add() from Nathan Fontenot
cxl:
- Add set and get private data to context struct from Michael Neuling
- make base more explicitly non-modular from Paul Gortmaker
- Use for_each_compatible_node() macro from Wei Yongjun
- Frederic Barrat
- Abstract the differences between the PSL and XSL
- Make vPHB device node match adapter's
- Philippe Bergheaud
- Add mechanism for delivering AFU driver specific events
- Ignore CAPI adapters misplaced in switched slots
- Refine slice error debug messages
- Andrew Donnellan
- static-ify variables to fix sparse warnings
- PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
- PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
- Add cxl_check_and_switch_mode() API to switch bi-modal cards
- remove dead Kconfig options
- fix potential NULL dereference in free_adapter()
- Ian Munsie
- Update process element after allocating interrupts
- Add support for CAPP DMA mode
- Fix allowing bogus AFU descriptors with 0 maximum processes
- Fix allocating a minimum of 2 pages for the SPA
- Fix bug where AFU disable operation had no effect
- Workaround XSL bug that does not clear the RA bit after a reset
- Fix NULL pointer dereference on kernel contexts with no AFU interrupts
- powerpc/powernv: Split cxl code out into a separate file
- Add cxl_slot_is_supported API
- Enable bus mastering for devices using CAPP DMA mode
- Move cxl_afu_get / cxl_afu_put to base
- Allow a default context to be associated with an external pci_dev
- Do not create vPHB if there are no AFU configuration records
- powerpc/powernv: Add support for the cxl kernel api on the real phb
- Add support for using the kernel API with a real PHB
- Add kernel APIs to get & set the max irqs per context
- Add preliminary workaround for CX4 interrupt limitation
- Add support for interrupts on the Mellanox CX4
- Workaround PE=0 hardware limitation in Mellanox CX4
- powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n
selftests:
- Test unaligned copy and paste from Chris Smart
- Load Monitor Register Tests from Jack Miller
- Cyril Bur
- exec() with suspended transaction
- Use signed long to read perf_event_paranoid
- Fix usage message in context_switch
- Fix generation of vector instructions/types in context_switch
- Michael Ellerman
- Use "Delta" rather than "Error" in normal output
- Import Anton's mmap & futex micro benchmarks
- Add a test for PROT_SAO"
* tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (263 commits)
powerpc/mm: Parenthesise IS_ENABLED() in if condition
tty/hvc: Use opal irqchip interface if available
tty/hvc: Use IRQF_SHARED for OPAL hvc consoles
selftests/powerpc: exec() with suspended transaction
powerpc: Improve comment explaining why we modify VRSAVE
powerpc/mm: Drop unused externs for hpte_init_beat[_v3]()
powerpc/mm: Rename hpte_init_lpar() and move the fallback to a header
powerpc/mm: Fix build break when PPC_NATIVE=n
crypto: vmx - Convert to CPU feature based module autoloading
powerpc: Add module autoloading based on CPU features
powerpc/powernv/ioda: Fix endianness when reading TCEs
powerpc/mm: Add memory barrier in __hugepte_alloc()
powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() call
powerpc/ftrace: Separate the heuristics for checking call sites
powerpc: Merge 32-bit and 64-bit setup_arch()
powerpc/64: Make a few boot functions __init
powerpc: Re-order setup_panic()
powerpc: Re-order the call to smp_setup_cpu_maps()
powerpc/32: Move cache info inits to a separate function
powerpc/64: Move the content of setup_system() to setup_arch()
...
Diffstat (limited to 'arch/powerpc/kernel')
45 files changed, 1201 insertions, 784 deletions
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 2da380fcc34c..fe4c075bcf50 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -42,12 +42,11 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_ppc970.o cpu_setup_pa6t.o obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o -obj64-$(CONFIG_RELOCATABLE) += reloc_64.o obj-$(CONFIG_PPC_BOOK3E_64) += exceptions-64e.o idle_book3e.o obj-$(CONFIG_PPC64) += vdso64/ obj-$(CONFIG_ALTIVEC) += vecemu.o obj-$(CONFIG_PPC_970_NAP) += idle_power4.o -obj-$(CONFIG_PPC_P7_NAP) += idle_power7.o +obj-$(CONFIG_PPC_P7_NAP) += idle_book3s.o procfs-y := proc_powerpc.o obj-$(CONFIG_PROC_FS) += $(procfs-y) rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI) := rtas_pci.o @@ -87,7 +86,7 @@ extra-$(CONFIG_FSL_BOOKE) := head_fsl_booke.o extra-$(CONFIG_8xx) := head_8xx.o extra-y += vmlinux.lds -obj-$(CONFIG_RELOCATABLE_PPC32) += reloc_32.o +obj-$(CONFIG_RELOCATABLE) += reloc_$(CONFIG_WORD_SIZE).o obj-$(CONFIG_PPC32) += entry_32.o setup_32.o obj-$(CONFIG_PPC64) += dma-iommu.o iommu.o diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 8e7cb8e2b21a..c7097f933114 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -228,9 +228,7 @@ static int emulate_dcbz(struct pt_regs *regs, unsigned char __user *addr) #else #define REG_BYTE(rp, i) *((u8 *)(rp) + (i)) #endif -#endif - -#ifdef __LITTLE_ENDIAN__ +#else #define REG_BYTE(rp, i) (*(((u8 *)((rp) + ((i)>>2)) + ((i)&3)))) #endif @@ -875,6 +873,20 @@ int fix_alignment(struct pt_regs *regs) return emulate_vsx(addr, reg, areg, regs, flags, nb, elsize); } #endif + + /* + * ISA 3.0 (such as P9) copy, copy_first, paste and paste_last alignment + * check. + * + * Send a SIGBUS to the process that caused the fault. + * + * We do not emulate these because paste may contain additional metadata + * when pasting to a co-processor. Furthermore, paste_last is the + * synchronisation point for preceding copy/paste sequences. + */ + if ((instruction & 0xfc0006fe) == PPC_INST_COPY) + return -EIO; + /* A size of 0 indicates an instruction we don't support, with * the exception of DCBZ which is handled as a special case here */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9ea09551a2cd..b89d14c0352c 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -68,17 +68,18 @@ #include "../mm/mmu_decl.h" #endif +#ifdef CONFIG_PPC_8xx +#include <asm/fixmap.h> +#endif + int main(void) { DEFINE(THREAD, offsetof(struct task_struct, thread)); DEFINE(MM, offsetof(struct task_struct, mm)); DEFINE(MMCONTEXTID, offsetof(struct mm_struct, context.id)); #ifdef CONFIG_PPC64 - DEFINE(AUDITCONTEXT, offsetof(struct task_struct, audit_context)); DEFINE(SIGSEGV, SIGSEGV); DEFINE(NMI_MASK, NMI_MASK); - DEFINE(THREAD_DSCR, offsetof(struct thread_struct, dscr)); - DEFINE(THREAD_DSCR_INHERIT, offsetof(struct thread_struct, dscr_inherit)); DEFINE(TASKTHREADPPR, offsetof(struct task_struct, thread.ppr)); #else DEFINE(THREAD_INFO, offsetof(struct task_struct, stack)); @@ -132,17 +133,6 @@ int main(void) DEFINE(THREAD_KVM_VCPU, offsetof(struct thread_struct, kvm_vcpu)); #endif -#ifdef CONFIG_PPC_BOOK3S_64 - DEFINE(THREAD_TAR, offsetof(struct thread_struct, tar)); - DEFINE(THREAD_BESCR, offsetof(struct thread_struct, bescr)); - DEFINE(THREAD_EBBHR, offsetof(struct thread_struct, ebbhr)); - DEFINE(THREAD_EBBRR, offsetof(struct thread_struct, ebbrr)); - DEFINE(THREAD_SIAR, offsetof(struct thread_struct, siar)); - DEFINE(THREAD_SDAR, offsetof(struct thread_struct, sdar)); - DEFINE(THREAD_SIER, offsetof(struct thread_struct, sier)); - DEFINE(THREAD_MMCR0, offsetof(struct thread_struct, mmcr0)); - DEFINE(THREAD_MMCR2, offsetof(struct thread_struct, mmcr2)); -#endif #ifdef CONFIG_PPC_TRANSACTIONAL_MEM DEFINE(PACATMSCRATCH, offsetof(struct paca_struct, tm_scratch)); DEFINE(THREAD_TM_TFHAR, offsetof(struct thread_struct, tm_tfhar)); @@ -178,7 +168,6 @@ int main(void) DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page)); /* paca */ DEFINE(PACA_SIZE, sizeof(struct paca_struct)); - DEFINE(PACA_LOCK_TOKEN, offsetof(struct paca_struct, lock_token)); DEFINE(PACAPACAINDEX, offsetof(struct paca_struct, paca_index)); DEFINE(PACAPROCSTART, offsetof(struct paca_struct, cpu_start)); DEFINE(PACAKSAVE, offsetof(struct paca_struct, kstack)); @@ -255,13 +244,28 @@ int main(void) DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state)); DEFINE(PACA_DSCR_DEFAULT, offsetof(struct paca_struct, dscr_default)); - DEFINE(PACA_STARTTIME, offsetof(struct paca_struct, starttime)); - DEFINE(PACA_STARTTIME_USER, offsetof(struct paca_struct, starttime_user)); - DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time)); - DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time)); + DEFINE(ACCOUNT_STARTTIME, + offsetof(struct paca_struct, accounting.starttime)); + DEFINE(ACCOUNT_STARTTIME_USER, + offsetof(struct paca_struct, accounting.starttime_user)); + DEFINE(ACCOUNT_USER_TIME, + offsetof(struct paca_struct, accounting.user_time)); + DEFINE(ACCOUNT_SYSTEM_TIME, + offsetof(struct paca_struct, accounting.system_time)); DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save)); DEFINE(PACA_NAPSTATELOST, offsetof(struct paca_struct, nap_state_lost)); DEFINE(PACA_SPRG_VDSO, offsetof(struct paca_struct, sprg_vdso)); +#else /* CONFIG_PPC64 */ +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE + DEFINE(ACCOUNT_STARTTIME, + offsetof(struct thread_info, accounting.starttime)); + DEFINE(ACCOUNT_STARTTIME_USER, + offsetof(struct thread_info, accounting.starttime_user)); + DEFINE(ACCOUNT_USER_TIME, + offsetof(struct thread_info, accounting.user_time)); + DEFINE(ACCOUNT_SYSTEM_TIME, + offsetof(struct thread_info, accounting.system_time)); +#endif #endif /* CONFIG_PPC64 */ /* RTAS */ @@ -275,12 +279,6 @@ int main(void) /* Create extra stack space for SRR0 and SRR1 when calling prom/rtas. */ DEFINE(PROM_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 16); DEFINE(RTAS_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 16); - - /* hcall statistics */ - DEFINE(HCALL_STAT_SIZE, sizeof(struct hcall_stats)); - DEFINE(HCALL_STAT_CALLS, offsetof(struct hcall_stats, num_calls)); - DEFINE(HCALL_STAT_TB, offsetof(struct hcall_stats, tb_total)); - DEFINE(HCALL_STAT_PURR, offsetof(struct hcall_stats, purr_total)); #endif /* CONFIG_PPC64 */ DEFINE(GPR0, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[0])); DEFINE(GPR1, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[1])); @@ -298,23 +296,6 @@ int main(void) DEFINE(GPR13, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[13])); #ifndef CONFIG_PPC64 DEFINE(GPR14, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[14])); - DEFINE(GPR15, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[15])); - DEFINE(GPR16, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[16])); - DEFINE(GPR17, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[17])); - DEFINE(GPR18, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[18])); - DEFINE(GPR19, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[19])); - DEFINE(GPR20, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[20])); - DEFINE(GPR21, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[21])); - DEFINE(GPR22, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[22])); - DEFINE(GPR23, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[23])); - DEFINE(GPR24, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[24])); - DEFINE(GPR25, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[25])); - DEFINE(GPR26, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[26])); - DEFINE(GPR27, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[27])); - DEFINE(GPR28, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[28])); - DEFINE(GPR29, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[29])); - DEFINE(GPR30, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[30])); - DEFINE(GPR31, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, gpr[31])); #endif /* CONFIG_PPC64 */ /* * Note: these symbols include _ because they overlap with special @@ -332,7 +313,6 @@ int main(void) DEFINE(RESULT, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, result)); DEFINE(_TRAP, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, trap)); #ifndef CONFIG_PPC64 - DEFINE(_MQ, STACK_FRAME_OVERHEAD+offsetof(struct pt_regs, mq)); /* * The PowerPC 400-class & Book-E processors have neither the DAR * nor the DSISR SPRs. Hence, we overload them to hold the similar @@ -369,8 +349,6 @@ int main(void) DEFINE(SAVED_KSP_LIMIT, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, saved_ksp_limit)); #endif #endif - DEFINE(CLONE_VM, CLONE_VM); - DEFINE(CLONE_UNTRACED, CLONE_UNTRACED); #ifndef CONFIG_PPC64 DEFINE(MM_PGD, offsetof(struct mm_struct, pgd)); @@ -380,7 +358,6 @@ int main(void) DEFINE(CPU_SPEC_FEATURES, offsetof(struct cpu_spec, cpu_features)); DEFINE(CPU_SPEC_SETUP, offsetof(struct cpu_spec, cpu_setup)); DEFINE(CPU_SPEC_RESTORE, offsetof(struct cpu_spec, cpu_restore)); - DEFINE(CPU_DOWN_FLUSH, offsetof(struct cpu_spec, cpu_down_flush)); DEFINE(pbe_address, offsetof(struct pbe, address)); DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address)); @@ -395,7 +372,6 @@ int main(void) DEFINE(CFG_TB_ORIG_STAMP, offsetof(struct vdso_data, tb_orig_stamp)); DEFINE(CFG_TB_TICKS_PER_SEC, offsetof(struct vdso_data, tb_ticks_per_sec)); DEFINE(CFG_TB_TO_XS, offsetof(struct vdso_data, tb_to_xs)); - DEFINE(CFG_STAMP_XSEC, offsetof(struct vdso_data, stamp_xsec)); DEFINE(CFG_TB_UPDATE_COUNT, offsetof(struct vdso_data, tb_update_count)); DEFINE(CFG_TZ_MINUTEWEST, offsetof(struct vdso_data, tz_minuteswest)); DEFINE(CFG_TZ_DSTTIME, offsetof(struct vdso_data, tz_dsttime)); @@ -517,7 +493,6 @@ int main(void) DEFINE(KVM_HOST_SDR1, offsetof(struct kvm, arch.host_sdr1)); DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits)); DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls)); - DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr)); DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v)); DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr)); DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); @@ -528,7 +503,6 @@ int main(void) DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu)); #endif #ifdef CONFIG_PPC_BOOK3S - DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); DEFINE(VCPU_PURR, offsetof(struct kvm_vcpu, arch.purr)); DEFINE(VCPU_SPURR, offsetof(struct kvm_vcpu, arch.spurr)); DEFINE(VCPU_IC, offsetof(struct kvm_vcpu, arch.ic)); @@ -566,7 +540,6 @@ int main(void) DEFINE(VCPU_CFAR, offsetof(struct kvm_vcpu, arch.cfar)); DEFINE(VCPU_PPR, offsetof(struct kvm_vcpu, arch.ppr)); DEFINE(VCPU_FSCR, offsetof(struct kvm_vcpu, arch.fscr)); - DEFINE(VCPU_SHADOW_FSCR, offsetof(struct kvm_vcpu, arch.shadow_fscr)); DEFINE(VCPU_PSPB, offsetof(struct kvm_vcpu, arch.pspb)); DEFINE(VCPU_EBBHR, offsetof(struct kvm_vcpu, arch.ebbhr)); DEFINE(VCPU_EBBRR, offsetof(struct kvm_vcpu, arch.ebbrr)); @@ -576,7 +549,6 @@ int main(void) DEFINE(VCPU_TCSCR, offsetof(struct kvm_vcpu, arch.tcscr)); DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop)); DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort)); - DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1)); DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map)); DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest)); DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads)); @@ -693,7 +665,6 @@ int main(void) DEFINE(KVM_SPLIT_RPR, offsetof(struct kvm_split_mode, rpr)); DEFINE(KVM_SPLIT_PMMAR, offsetof(struct kvm_split_mode, pmmar)); DEFINE(KVM_SPLIT_LDBAR, offsetof(struct kvm_split_mode, ldbar)); - DEFINE(KVM_SPLIT_SIZE, offsetof(struct kvm_split_mode, subcore_size)); DEFINE(KVM_SPLIT_DO_NAP, offsetof(struct kvm_split_mode, do_nap)); DEFINE(KVM_SPLIT_NAPPED, offsetof(struct kvm_split_mode, napped)); #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ @@ -756,7 +727,6 @@ int main(void) #ifdef CONFIG_KVM_BOOKE_HV DEFINE(VCPU_HOST_MAS4, offsetof(struct kvm_vcpu, arch.host_mas4)); DEFINE(VCPU_HOST_MAS6, offsetof(struct kvm_vcpu, arch.host_mas6)); - DEFINE(VCPU_EPLC, offsetof(struct kvm_vcpu, arch.eplc)); #endif #ifdef CONFIG_KVM_EXIT_TIMING @@ -783,5 +753,9 @@ int main(void) DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER); +#ifdef CONFIG_PPC_8xx + DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE)); +#endif + return 0; } diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S b/arch/powerpc/kernel/cpu_setup_6xx.S index f8cd9fba4d35..c5e5a94d9892 100644 --- a/arch/powerpc/kernel/cpu_setup_6xx.S +++ b/arch/powerpc/kernel/cpu_setup_6xx.S @@ -156,7 +156,7 @@ setup_7410_workarounds: blr /* 740/750/7400/7410 - * Enable Store Gathering (SGE), Address Brodcast (ABE), + * Enable Store Gathering (SGE), Address Broadcast (ABE), * Branch History Table (BHTE), Branch Target ICache (BTIC) * Dynamic Power Management (DPM), Speculative (SPD) * Clear Instruction cache throttling (ICTC) diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S index 584e119fa8b0..52ff3f025437 100644 --- a/arch/powerpc/kernel/cpu_setup_power.S +++ b/arch/powerpc/kernel/cpu_setup_power.S @@ -51,6 +51,7 @@ _GLOBAL(__setup_cpu_power8) mflr r11 bl __init_FSCR bl __init_PMU + bl __init_PMU_ISA207 bl __init_hvmode_206 mtlr r11 beqlr @@ -62,6 +63,7 @@ _GLOBAL(__setup_cpu_power8) bl __init_HFSCR bl __init_tlb_power8 bl __init_PMU_HV + bl __init_PMU_HV_ISA207 mtlr r11 blr @@ -69,6 +71,7 @@ _GLOBAL(__restore_cpu_power8) mflr r11 bl __init_FSCR bl __init_PMU + bl __init_PMU_ISA207 mfmsr r3 rldicl. r0,r3,4,63 mtlr r11 @@ -81,12 +84,14 @@ _GLOBAL(__restore_cpu_power8) bl __init_HFSCR bl __init_tlb_power8 bl __init_PMU_HV + bl __init_PMU_HV_ISA207 mtlr r11 blr _GLOBAL(__setup_cpu_power9) mflr r11 bl __init_FSCR + bl __init_PMU bl __init_hvmode_206 mtlr r11 beqlr @@ -94,15 +99,18 @@ _GLOBAL(__setup_cpu_power9) mtspr SPRN_LPID,r0 mfspr r3,SPRN_LPCR ori r3, r3, LPCR_PECEDH + ori r3, r3, LPCR_HVICE bl __init_LPCR bl __init_HFSCR bl __init_tlb_power9 + bl __init_PMU_HV mtlr r11 blr _GLOBAL(__restore_cpu_power9) mflr r11 bl __init_FSCR + bl __init_PMU mfmsr r3 rldicl. r0,r3,4,63 mtlr r11 @@ -111,9 +119,11 @@ _GLOBAL(__restore_cpu_power9) mtspr SPRN_LPID,r0 mfspr r3,SPRN_LPCR ori r3, r3, LPCR_PECEDH + ori r3, r3, LPCR_HVICE bl __init_LPCR bl __init_HFSCR bl __init_tlb_power9 + bl __init_PMU_HV mtlr r11 blr @@ -208,14 +218,22 @@ __init_tlb_power9: __init_PMU_HV: li r5,0 mtspr SPRN_MMCRC,r5 + blr + +__init_PMU_HV_ISA207: + li r5,0 mtspr SPRN_MMCRH,r5 blr __init_PMU: li r5,0 - mtspr SPRN_MMCRS,r5 mtspr SPRN_MMCRA,r5 mtspr SPRN_MMCR0,r5 mtspr SPRN_MMCR1,r5 mtspr SPRN_MMCR2,r5 blr + +__init_PMU_ISA207: + li r5,0 + mtspr SPRN_MMCRS,r5 + blr diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index eeeacf6235a3..d81f826d1029 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -137,7 +137,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .cpu_name = "POWER4 (gp)", .cpu_features = CPU_FTRS_POWER4, .cpu_user_features = COMMON_USER_POWER4, - .mmu_features = MMU_FTRS_POWER4, + .mmu_features = MMU_FTRS_POWER4 | MMU_FTR_TLBIE_CROP_VA, .icache_bsize = 128, .dcache_bsize = 128, .num_pmcs = 8, @@ -152,7 +152,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .cpu_name = "POWER4+ (gq)", .cpu_features = CPU_FTRS_POWER4, .cpu_user_features = COMMON_USER_POWER4, - .mmu_features = MMU_FTRS_POWER4, + .mmu_features = MMU_FTRS_POWER4 | MMU_FTR_TLBIE_CROP_VA, .icache_bsize = 128, .dcache_bsize = 128, .num_pmcs = 8, diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 2bb252c01f07..47b63de81f9b 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -48,8 +48,8 @@ int crashing_cpu = -1; static int time_to_dump; #define CRASH_HANDLER_MAX 3 -/* NULL terminated list of shutdown handles */ -static crash_shutdown_t crash_shutdown_handles[CRASH_HANDLER_MAX+1]; +/* List of shutdown handles */ +static crash_shutdown_t crash_shutdown_handles[CRASH_HANDLER_MAX]; static DEFINE_SPINLOCK(crash_handlers_lock); static unsigned long crash_shutdown_buf[JMP_BUF_LEN]; @@ -65,7 +65,7 @@ static int handle_fault(struct pt_regs *regs) #ifdef CONFIG_SMP static atomic_t cpus_in_crash; -void crash_ipi_callback(struct pt_regs *regs) +static void crash_ipi_callback(struct pt_regs *regs) { static cpumask_t cpus_state_saved = CPU_MASK_NONE; @@ -288,9 +288,14 @@ int crash_shutdown_unregister(crash_shutdown_t handler) rc = 1; } else { /* Shift handles down */ - for (; crash_shutdown_handles[i]; i++) + for (; i < (CRASH_HANDLER_MAX - 1); i++) crash_shutdown_handles[i] = crash_shutdown_handles[i+1]; + /* + * Reset last entry to NULL now that it has been shifted down, + * this will allow new handles to be added here. + */ + crash_shutdown_handles[i] = NULL; rc = 0; } @@ -346,7 +351,7 @@ void default_machine_crash_shutdown(struct pt_regs *regs) old_handler = __debugger_fault_handler; __debugger_fault_handler = handle_fault; crash_shutdown_cpu = smp_processor_id(); - for (i = 0; crash_shutdown_handles[i]; i++) { + for (i = 0; i < CRASH_HANDLER_MAX && crash_shutdown_handles[i]; i++) { if (setjmp(crash_shutdown_buf) == 0) { /* * Insert syncs and delay to ensure diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index ddbcfab7efdf..d4cc26618809 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -114,9 +114,9 @@ static void eeh_addr_cache_print(struct pci_io_addr_cache *cache) while (n) { struct pci_io_addr_range *piar; piar = rb_entry(n, struct pci_io_addr_range, rb_node); - pr_debug("PCI: %s addr range %d [%lx-%lx]: %s\n", + pr_debug("PCI: %s addr range %d [%pap-%pap]: %s\n", (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, - piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); + &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev)); cnt++; n = rb_next(n); } @@ -159,8 +159,8 @@ eeh_addr_cache_insert(struct pci_dev *dev, resource_size_t alo, piar->flags = flags; #ifdef DEBUG - pr_debug("PIAR: insert range=[%lx:%lx] dev=%s\n", - alo, ahi, pci_name(dev)); + pr_debug("PIAR: insert range=[%pap:%pap] dev=%s\n", + &alo, &ahi, pci_name(dev)); #endif rb_link_node(&piar->rb_node, parent, p); diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c index 7815095fe3d8..d6b2ca70d14d 100644 --- a/arch/powerpc/kernel/eeh_dev.c +++ b/arch/powerpc/kernel/eeh_dev.c @@ -44,14 +44,13 @@ /** * eeh_dev_init - Create EEH device according to OF node * @pdn: PCI device node - * @data: PHB * * It will create EEH device according to the given OF node. The function * might be called by PCI emunation, DR, PHB hotplug. */ -void *eeh_dev_init(struct pci_dn *pdn, void *data) +struct eeh_dev *eeh_dev_init(struct pci_dn *pdn) { - struct pci_controller *phb = data; + struct pci_controller *phb = pdn->phb; struct eeh_dev *edev; /* Allocate EEH device */ @@ -69,7 +68,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data) INIT_LIST_HEAD(&edev->list); INIT_LIST_HEAD(&edev->rmv_list); - return NULL; + return edev; } /** @@ -81,16 +80,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data) */ void eeh_dev_phb_init_dynamic(struct pci_controller *phb) { - struct pci_dn *root = phb->pci_data; - /* EEH PE for PHB */ eeh_phb_pe_create(phb); - - /* EEH device for PHB */ - eeh_dev_init(root, phb); - - /* EEH devices for children OF nodes */ - traverse_pci_dn(root, eeh_dev_init, phb); } /** @@ -106,8 +97,6 @@ static int __init eeh_dev_phb_init(void) list_for_each_entry_safe(phb, tmp, &hose_list, list_node) eeh_dev_phb_init_dynamic(phb); - pr_info("EEH: devices created\n"); - return 0; } diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index d70101e1e25c..5f36e8a70daa 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -139,7 +139,7 @@ static void eeh_enable_irq(struct pci_dev *dev) * into it. * * That's just wrong.The warning in the core code is - * there to tell people to fix their assymetries in + * there to tell people to fix their asymmetries in * their own code, not by abusing the core information * to avoid it. * diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 2405631e91a2..9899032230b4 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -175,6 +175,12 @@ transfer_to_handler: addi r12,r12,-1 stw r12,4(r11) #endif +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE + CURRENT_THREAD_INFO(r9, r1) + tophys(r9, r9) + ACCOUNT_CPU_USER_ENTRY(r9, r11, r12) +#endif + b 3f 2: /* if from kernel, check interrupted DOZE/NAP mode and @@ -398,6 +404,13 @@ BEGIN_FTR_SECTION lwarx r7,0,r1 END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) stwcx. r0,0,r1 /* to clear the reservation */ +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE + andi. r4,r8,MSR_PR + beq 3f + CURRENT_THREAD_INFO(r4, r1) + ACCOUNT_CPU_USER_EXIT(r4, r5, r7) +3: +#endif lwz r4,_LINK(r1) lwz r5,_CCR(r1) mtlr r4 @@ -769,6 +782,10 @@ restore_user: andis. r10,r0,DBCR0_IDM@h bnel- load_dbcr0 #endif +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE + CURRENT_THREAD_INFO(r9, r1) + ACCOUNT_CPU_USER_EXIT(r9, r10, r11) +#endif b restore diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 73e461a3dfbb..fcb2887f5a33 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -72,7 +72,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) std r0,GPR0(r1) std r10,GPR1(r1) beq 2f /* if from kernel mode */ - ACCOUNT_CPU_USER_ENTRY(r10, r11) + ACCOUNT_CPU_USER_ENTRY(r13, r10, r11) 2: std r2,GPR2(r1) std r3,GPR3(r1) mfcr r2 @@ -246,7 +246,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) ld r4,_LINK(r1) beq- 1f - ACCOUNT_CPU_USER_EXIT(r11, r12) + ACCOUNT_CPU_USER_EXIT(r13, r11, r12) BEGIN_FTR_SECTION HMT_MEDIUM_LOW @@ -453,7 +453,7 @@ _GLOBAL(ret_from_kernel_thread) REST_NVGPRS(r1) mtlr r14 mr r3,r15 -#if defined(_CALL_ELF) && _CALL_ELF == 2 +#ifdef PPC64_ELF_ABI_v2 mr r12,r14 #endif blrl @@ -859,7 +859,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) BEGIN_FTR_SECTION mtspr SPRN_PPR,r2 /* Restore PPR */ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) - ACCOUNT_CPU_USER_EXIT(r2, r4) + ACCOUNT_CPU_USER_EXIT(r13, r2, r4) REST_GPR(13, r1) 1: mtspr SPRN_SRR1,r3 diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 488e6314f993..38a1f96430e1 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -386,7 +386,7 @@ exc_##n##_common: \ std r10,_NIP(r1); /* save SRR0 to stackframe */ \ std r11,_MSR(r1); /* save SRR1 to stackframe */ \ beq 2f; /* if from kernel mode */ \ - ACCOUNT_CPU_USER_ENTRY(r10,r11);/* accounting (uses cr0+eq) */ \ + ACCOUNT_CPU_USER_ENTRY(r13,r10,r11);/* accounting (uses cr0+eq) */ \ 2: ld r3,excf+EX_R10(r13); /* get back r10 */ \ ld r4,excf+EX_R11(r13); /* get back r11 */ \ mfspr r5,scratch; /* get back r13 */ \ @@ -453,7 +453,7 @@ exc_##n##_bad_stack: \ sth r1,PACA_TRAP_SAVE(r13); /* store trap */ \ b bad_stack_book3e; /* bad stack error */ -/* WARNING: If you change the layout of this stub, make sure you chcek +/* WARNING: If you change the layout of this stub, make sure you check * the debug exception handler which handles single stepping * into exceptions from userspace, and the MM code in * arch/powerpc/mm/tlb_nohash.c which patches the branch here @@ -1059,7 +1059,7 @@ fast_exception_return: andi. r6,r10,MSR_PR REST_2GPRS(6, r1) beq 1f - ACCOUNT_CPU_USER_EXIT(r10, r11) + ACCOUNT_CPU_USER_EXIT(r13, r10, r11) ld r0,GPR13(r1) 1: stdcx. r0,0,r1 /* to clear the reservation */ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 8bcc1b457115..6200e4925d26 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -107,25 +107,9 @@ BEGIN_FTR_SECTION beq 9f cmpwi cr3,r13,2 - - /* - * Check if last bit of HSPGR0 is set. This indicates whether we are - * waking up from winkle. - */ GET_PACA(r13) - clrldi r5,r13,63 - clrrdi r13,r13,1 - cmpwi cr4,r5,1 - mtspr SPRN_HSPRG0,r13 - - lbz r0,PACA_THREAD_IDLE_STATE(r13) - cmpwi cr2,r0,PNV_THREAD_NAP - bgt cr2,8f /* Either sleep or Winkle */ + bl pnv_restore_hyp_resource - /* Waking up from nap should not cause hypervisor state loss */ - bgt cr3,. - - /* Waking up from nap */ li r0,PNV_THREAD_RUNNING stb r0,PACA_THREAD_IDLE_STATE(r13) /* Clear thread state */ @@ -143,13 +127,9 @@ BEGIN_FTR_SECTION /* Return SRR1 from power7_nap() */ mfspr r3,SPRN_SRR1 - beq cr3,2f - b power7_wakeup_noloss -2: b power7_wakeup_loss - - /* Fast Sleep wakeup on PowerNV */ -8: GET_PACA(r13) - b power7_wakeup_tb_loss + blt cr3,2f + b pnv_wakeup_loss +2: b pnv_wakeup_noloss 9: END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) @@ -351,6 +331,12 @@ hv_doorbell_trampoline: EXCEPTION_PROLOG_0(PACA_EXGEN) b h_doorbell_hv + . = 0xea0 +hv_virt_irq_trampoline: + SET_SCRATCH0(r13) + EXCEPTION_PROLOG_0(PACA_EXGEN) + b h_virt_irq_hv + /* We need to deal with the Altivec unavailable exception * here which is at 0xf20, thus in the middle of the * prolog code of the PerformanceMonitor one. A little @@ -601,6 +587,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) MASKABLE_EXCEPTION_HV_OOL(0xe82, h_doorbell) KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe82) + MASKABLE_EXCEPTION_HV_OOL(0xea2, h_virt_irq) + KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xea2) + /* moved from 0xf00 */ STD_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor) KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xf00) @@ -680,6 +669,8 @@ _GLOBAL(__replay_interrupt) BEGIN_FTR_SECTION cmpwi r3,0xe80 beq h_doorbell_common + cmpwi r3,0xea0 + beq h_virt_irq_common FTR_SECTION_ELSE cmpwi r3,0xa00 beq doorbell_super_common @@ -754,6 +745,7 @@ kvmppc_skip_Hinterrupt: #else STD_EXCEPTION_COMMON_ASYNC(0xe80, h_doorbell, unknown_exception) #endif + STD_EXCEPTION_COMMON_ASYNC(0xea0, h_virt_irq, do_IRQ) STD_EXCEPTION_COMMON_ASYNC(0xf00, performance_monitor, performance_monitor_exception) STD_EXCEPTION_COMMON(0x1300, instruction_breakpoint, instruction_breakpoint_exception) STD_EXCEPTION_COMMON(0x1502, denorm, unknown_exception) @@ -762,11 +754,6 @@ kvmppc_skip_Hinterrupt: #else STD_EXCEPTION_COMMON(0x1700, altivec_assist, unknown_exception) #endif -#ifdef CONFIG_CBE_RAS - STD_EXCEPTION_COMMON(0x1200, cbe_system_error, cbe_system_error_exception) - STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception) - STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception) -#endif /* CONFIG_CBE_RAS */ /* * Relocation-on interrupts: A subset of the interrupts can be delivered @@ -877,6 +864,12 @@ h_doorbell_relon_trampoline: EXCEPTION_PROLOG_0(PACA_EXGEN) b h_doorbell_relon_hv + . = 0x4ea0 +h_virt_irq_relon_trampoline: + SET_SCRATCH0(r13) + EXCEPTION_PROLOG_0(PACA_EXGEN) + b h_virt_irq_relon_hv + . = 0x4f00 performance_monitor_relon_pseries_trampoline: SET_SCRATCH0(r13) @@ -1131,12 +1124,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) bl vsx_unavailable_exception b ret_from_except - STD_EXCEPTION_COMMON(0xf60, facility_unavailable, facility_unavailable_exception) - STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, facility_unavailable_exception) - /* Equivalents to the above handlers for relocation-on interrupt vectors */ STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist) MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell) + MASKABLE_RELON_EXCEPTION_HV_OOL(0xea0, h_virt_irq) STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor) STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable) @@ -1170,6 +1161,15 @@ fwnmi_data_area: . = 0x8000 #endif /* defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) */ + STD_EXCEPTION_COMMON(0xf60, facility_unavailable, facility_unavailable_exception) + STD_EXCEPTION_COMMON(0xf80, hv_facility_unavailable, facility_unavailable_exception) + +#ifdef CONFIG_CBE_RAS + STD_EXCEPTION_COMMON(0x1200, cbe_system_error, cbe_system_error_exception) + STD_EXCEPTION_COMMON(0x1600, cbe_maintenance, cbe_maintenance_exception) + STD_EXCEPTION_COMMON(0x1800, cbe_thermal, cbe_thermal_exception) +#endif /* CONFIG_CBE_RAS */ + .globl hmi_exception_early hmi_exception_early: EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0xe60) @@ -1289,7 +1289,7 @@ machine_check_handle_early: GET_PACA(r13) ld r1,PACAR1(r13) li r3,PNV_THREAD_NAP - b power7_enter_nap_mode + b pnv_enter_arch207_idle_mode 4: #endif /* diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 3cb3b02a13dd..b3a663333d36 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1009,8 +1009,7 @@ static int fadump_invalidate_dump(struct fadump_mem_struct *fdm) } while (wait_time); if (rc) { - printk(KERN_ERR "Failed to invalidate firmware-assisted dump " - "rgistration. unexpected error(%d).\n", rc); + pr_err("Failed to invalidate firmware-assisted dump registration. Unexpected error (%d).\n", rc); return rc; } fw_dump.dump_active = 0; diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c index 1123a4d8d8dd..cc52d9795f88 100644 --- a/arch/powerpc/kernel/ftrace.c +++ b/arch/powerpc/kernel/ftrace.c @@ -144,6 +144,21 @@ __ftrace_make_nop(struct module *mod, return -EINVAL; } +#ifdef CC_USING_MPROFILE_KERNEL + /* When using -mkernel_profile there is no load to jump over */ + pop = PPC_INST_NOP; + + if (probe_kernel_read(&op, (void *)(ip - 4), 4)) { + pr_err("Fetching instruction at %lx failed.\n", ip - 4); + return -EFAULT; + } + + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */ + if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) { + pr_err("Unexpected instruction %08x around bl _mcount\n", op); + return -EINVAL; + } +#else /* * Our original call site looks like: * @@ -170,24 +185,10 @@ __ftrace_make_nop(struct module *mod, } if (op != PPC_INST_LD_TOC) { - unsigned int inst; - - if (probe_kernel_read(&inst, (void *)(ip - 4), 4)) { - pr_err("Fetching instruction at %lx failed.\n", ip - 4); - return -EFAULT; - } - - /* We expect either a mlfr r0, or a std r0, LRSAVE(r1) */ - if (inst != PPC_INST_MFLR && inst != PPC_INST_STD_LR) { - pr_err("Unexpected instructions around bl _mcount\n" - "when enabling dynamic ftrace!\t" - "(%08x,bl,%08x)\n", inst, op); - return -EINVAL; - } - - /* When using -mkernel_profile there is no load to jump over */ - pop = PPC_INST_NOP; + pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op); + return -EINVAL; } +#endif /* CC_USING_MPROFILE_KERNEL */ if (patch_instruction((unsigned int *)ip, pop)) { pr_err("Patching NOP failed.\n"); @@ -608,7 +609,7 @@ unsigned long __init arch_syscall_addr(int nr) } #endif /* CONFIG_FTRACE_SYSCALLS && CONFIG_PPC64 */ -#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2) +#ifdef PPC64_ELF_ABI_v1 char *arch_ftrace_match_adjust(char *str, const char *search) { if (str[0] == '.' && search[0] != '.') @@ -616,4 +617,4 @@ char *arch_ftrace_match_adjust(char *str, const char *search) else return str; } -#endif /* defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2) */ +#endif /* PPC64_ELF_ABI_v1 */ diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 2d14774af6b4..f765b0434731 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -401,7 +401,7 @@ generic_secondary_common_init: ld r12,CPU_SPEC_RESTORE(r23) cmpdi 0,r12,0 beq 3f -#if !defined(_CALL_ELF) || _CALL_ELF != 2 +#ifdef PPC64_ELF_ABI_v1 ld r12,0(r12) #endif mtctr r12 @@ -941,7 +941,7 @@ start_here_multiplatform: mtspr SPRN_SRR1,r4 RFI b . /* prevent speculative execution */ - + /* This is where all platforms converge execution */ start_here_common: @@ -951,9 +951,6 @@ start_here_common: /* Load the TOC (virtual address) */ ld r2,PACATOC(r13) - /* Do more system initializations in virtual mode */ - bl setup_system - /* Mark interrupts soft and hard disabled (they might be enabled * in the PACA when doing hotplug) */ diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 80c69472314e..43ddaae42baf 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -30,6 +30,7 @@ #include <asm/ppc_asm.h> #include <asm/asm-offsets.h> #include <asm/ptrace.h> +#include <asm/fixmap.h> /* Macro to make the code more readable. */ #ifdef CONFIG_8xx_CPU6 @@ -383,28 +384,57 @@ InstructionTLBMiss: EXCEPTION_EPILOG_0 rfi +/* + * Bottom part of DataStoreTLBMiss handler for IMMR area + * not enough space in the DataStoreTLBMiss area + */ +DTLBMissIMMR: + mtcr r10 + /* Set 512k byte guarded page and mark it valid */ + li r10, MD_PS512K | MD_GUARDED | MD_SVALID + MTSPR_CPU6(SPRN_MD_TWC, r10, r11) + mfspr r10, SPRN_IMMR /* Get current IMMR */ + rlwinm r10, r10, 0, 0xfff80000 /* Get 512 kbytes boundary */ + ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY | \ + _PAGE_PRESENT | _PAGE_NO_CACHE + MTSPR_CPU6(SPRN_MD_RPN, r10, r11) /* Update TLB entry */ + + li r11, RPN_PATTERN + mtspr SPRN_DAR, r11 /* Tag DAR */ + EXCEPTION_EPILOG_0 + rfi + . = 0x1200 DataStoreTLBMiss: - mtspr SPRN_SPRG_SCRATCH2, r3 EXCEPTION_PROLOG_0 - mfcr r3 + mfcr r10 /* If we are faulting a kernel address, we have to use the * kernel page tables. */ - mfspr r10, SPRN_MD_EPN - IS_KERNEL(r11, r10) + mfspr r11, SPRN_MD_EPN + rlwinm r11, r11, 16, 0xfff8 +#ifndef CONFIG_PIN_TLB_IMMR + cmpli cr0, r11, VIRT_IMMR_BASE@h +#endif + cmpli cr7, r11, PAGE_OFFSET@h +#ifndef CONFIG_PIN_TLB_IMMR +_ENTRY(DTLBMiss_jmp) + beq- DTLBMissIMMR +#endif + bge- cr7, 4f + mfspr r11, SPRN_M_TW /* Get level 1 table */ - BRANCH_UNLESS_KERNEL(3f) - lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha 3: + mtcr r10 +#ifdef CONFIG_8xx_CPU6 + mtspr SPRN_SPRG_SCRATCH2, r3 +#endif + mfspr r10, SPRN_MD_EPN /* Insert level 1 index */ rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29 lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11) /* Get the level 1 entry */ - mtcr r11 - bt- 28,DTLBMiss8M /* bit 28 = Large page (8M) */ - mtcr r3 /* We have a pte table, so load fetch the pte from the table. */ @@ -452,29 +482,30 @@ DataStoreTLBMiss: MTSPR_CPU6(SPRN_MD_RPN, r10, r3) /* Update TLB entry */ /* Restore registers */ +#ifdef CONFIG_8xx_CPU6 mfspr r3, SPRN_SPRG_SCRATCH2 +#endif mtspr SPRN_DAR, r11 /* Tag DAR */ EXCEPTION_EPILOG_0 rfi -DTLBMiss8M: - mtcr r3 - ori r11, r11, MD_SVALID - MTSPR_CPU6(SPRN_MD_TWC, r11, r3) -#ifdef CONFIG_PPC_16K_PAGES - /* - * In 16k pages mode, each PGD entry defines a 64M block. - * Here we select the 8M page within the block. - */ - rlwimi r11, r10, 0, 0x03800000 -#endif - rlwinm r10, r11, 0, 0xff800000 +4: +_ENTRY(DTLBMiss_cmp) + cmpli cr0, r11, (PAGE_OFFSET + 0x1800000)@h + lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha + bge- 3b + + mtcr r10 + /* Set 8M byte page and mark it valid */ + li r10, MD_PS8MEG | MD_SVALID + MTSPR_CPU6(SPRN_MD_TWC, r10, r11) + mfspr r10, SPRN_MD_EPN + rlwinm r10, r10, 0, 0x0f800000 /* 8xx supports max 256Mb RAM */ ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SHARED | _PAGE_DIRTY | \ _PAGE_PRESENT - MTSPR_CPU6(SPRN_MD_RPN, r10, r3) /* Update TLB entry */ + MTSPR_CPU6(SPRN_MD_RPN, r10, r11) /* Update TLB entry */ li r11, RPN_PATTERN - mfspr r3, SPRN_SPRG_SCRATCH2 mtspr SPRN_DAR, r11 /* Tag DAR */ EXCEPTION_EPILOG_0 rfi @@ -553,12 +584,14 @@ FixupDAR:/* Entry point for dcbx workaround. */ IS_KERNEL(r11, r10) mfspr r11, SPRN_M_TW /* Get level 1 table */ BRANCH_UNLESS_KERNEL(3f) + rlwinm r11, r10, 16, 0xfff8 +_ENTRY(FixupDAR_cmp) + cmpli cr7, r11, (PAGE_OFFSET + 0x1800000)@h + blt- cr7, 200f lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha /* Insert level 1 index */ 3: rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29 lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11) /* Get the level 1 entry */ - mtcr r11 - bt 28,200f /* bit 28 = Large page (8M) */ rlwinm r11, r11,0,0,19 /* Extract page descriptor page address */ /* Insert level 2 index */ rlwimi r11, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29 @@ -584,8 +617,8 @@ FixupDAR:/* Entry point for dcbx workaround. */ 141: mfspr r10,SPRN_SPRG_SCRATCH2 b DARFixed /* Nope, go back to normal TLB processing */ - /* concat physical page address(r11) and page offset(r10) */ -200: rlwimi r11, r10, 0, 32 - (PAGE_SHIFT << 1), 31 + /* create physical page address from effective address */ +200: tophys(r11, r10) b 201b 144: mfspr r10, SPRN_DSISR @@ -763,10 +796,18 @@ start_here: * virtual to physical. Also, set the cache mode since that is defined * by TLB entries and perform any additional mapping (like of the IMMR). * If configured to pin some TLBs, we pin the first 8 Mbytes of kernel, - * 24 Mbytes of data, and the 8M IMMR space. Anything not covered by + * 24 Mbytes of data, and the 512k IMMR space. Anything not covered by * these mappings is mapped by page tables. */ initial_mmu: + li r8, 0 + mtspr SPRN_MI_CTR, r8 /* remove PINNED ITLB entries */ + lis r10, MD_RESETVAL@h +#ifndef CONFIG_8xx_COPYBACK + oris r10, r10, MD_WTDEF@h +#endif + mtspr SPRN_MD_CTR, r10 /* remove PINNED DTLB entries */ + tlbia /* Invalidate all TLB entries */ /* Always pin the first 8 MB ITLB to prevent ITLB misses while mucking around with SRR0/SRR1 in asm @@ -777,34 +818,20 @@ initial_mmu: mtspr SPRN_MI_CTR, r8 /* Set instruction MMU control */ #ifdef CONFIG_PIN_TLB - lis r10, (MD_RSV4I | MD_RESETVAL)@h - ori r10, r10, 0x1c00 - mr r8, r10 -#else - lis r10, MD_RESETVAL@h -#endif -#ifndef CONFIG_8xx_COPYBACK - oris r10, r10, MD_WTDEF@h -#endif + oris r10, r10, MD_RSV4I@h mtspr SPRN_MD_CTR, r10 /* Set data TLB control */ +#endif - /* Now map the lower 8 Meg into the TLBs. For this quick hack, - * we can load the instruction and data TLB registers with the - * same values. - */ + /* Now map the lower 8 Meg into the ITLB. */ lis r8, KERNELBASE@h /* Create vaddr for TLB */ ori r8, r8, MI_EVALID /* Mark it valid */ mtspr SPRN_MI_EPN, r8 - mtspr SPRN_MD_EPN, r8 li r8, MI_PS8MEG | (2 << 5) /* Set 8M byte page, APG 2 */ ori r8, r8, MI_SVALID /* Make it valid */ mtspr SPRN_MI_TWC, r8 - li r8, MI_PS8MEG /* Set 8M byte page, APG 0 */ - ori r8, r8, MI_SVALID /* Make it valid */ - mtspr SPRN_MD_TWC, r8 li r8, MI_BOOTINIT /* Create RPN for address 0 */ mtspr SPRN_MI_RPN, r8 /* Store TLB entry */ - mtspr SPRN_MD_RPN, r8 + lis r8, MI_APG_INIT@h /* Set protection modes */ ori r8, r8, MI_APG_INIT@l mtspr SPRN_MI_AP, r8 @@ -812,51 +839,25 @@ initial_mmu: ori r8, r8, MD_APG_INIT@l mtspr SPRN_MD_AP, r8 - /* Map another 8 MByte at the IMMR to get the processor + /* Map a 512k page for the IMMR to get the processor * internal registers (among other things). */ -#ifdef CONFIG_PIN_TLB - addi r10, r10, 0x0100 +#ifdef CONFIG_PIN_TLB_IMMR + ori r10, r10, 0x1c00 mtspr SPRN_MD_CTR, r10 -#endif + mfspr r9, 638 /* Get current IMMR */ - andis. r9, r9, 0xff80 /* Get 8Mbyte boundary */ + andis. r9, r9, 0xfff8 /* Get 512 kbytes boundary */ - mr r8, r9 /* Create vaddr for TLB */ + lis r8, VIRT_IMMR_BASE@h /* Create vaddr for TLB */ ori r8, r8, MD_EVALID /* Mark it valid */ mtspr SPRN_MD_EPN, r8 - li r8, MD_PS8MEG /* Set 8M byte page */ + li r8, MD_PS512K | MD_GUARDED /* Set 512k byte page */ ori r8, r8, MD_SVALID /* Make it valid */ mtspr SPRN_MD_TWC, r8 mr r8, r9 /* Create paddr for TLB */ ori r8, r8, MI_BOOTINIT|0x2 /* Inhibit cache -- Cort */ mtspr SPRN_MD_RPN, r8 - -#ifdef CONFIG_PIN_TLB - /* Map two more 8M kernel data pages. - */ - addi r10, r10, 0x0100 - mtspr SPRN_MD_CTR, r10 - - lis r8, KERNELBASE@h /* Create vaddr for TLB */ - addis r8, r8, 0x0080 /* Add 8M */ - ori r8, r8, MI_EVALID /* Mark it valid */ - mtspr SPRN_MD_EPN, r8 - li r9, MI_PS8MEG /* Set 8M byte page */ - ori r9, r9, MI_SVALID /* Make it valid */ - mtspr SPRN_MD_TWC, r9 - li r11, MI_BOOTINIT /* Create RPN for address 0 */ - addis r11, r11, 0x0080 /* Add 8M */ - mtspr SPRN_MD_RPN, r11 - - addi r10, r10, 0x0100 - mtspr SPRN_MD_CTR, r10 - - addis r8, r8, 0x0080 /* Add 8M */ - mtspr SPRN_MD_EPN, r8 - mtspr SPRN_MD_TWC, r9 - addis r11, r11, 0x0080 /* Add 8M */ - mtspr SPRN_MD_RPN, r11 #endif /* Since the cache is enabled according to the information we diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_book3s.S index 470ceebd2d23..335eb6cedae5 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -1,5 +1,6 @@ /* - * This file contains the power_save function for Power7 CPUs. + * This file contains idle entry/exit functions for POWER7, + * POWER8 and POWER9 CPUs. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -20,6 +21,7 @@ #include <asm/opal.h> #include <asm/cpuidle.h> #include <asm/book3s/64/mmu-hash.h> +#include <asm/mmu.h> #undef DEBUG @@ -36,6 +38,11 @@ #define _AMOR GPR9 #define _WORT GPR10 #define _WORC GPR11 +#define _PTCR GPR12 + +#define PSSCR_HV_TEMPLATE PSSCR_ESL | PSSCR_EC | \ + PSSCR_PSLL_MASK | PSSCR_TR_MASK | \ + PSSCR_MTL_MASK /* Idle state entry routines */ @@ -52,6 +59,45 @@ .text /* + * Used by threads before entering deep idle states. Saves SPRs + * in interrupt stack frame + */ +save_sprs_to_stack: + /* + * Note all register i.e per-core, per-subcore or per-thread is saved + * here since any thread in the core might wake up first + */ +BEGIN_FTR_SECTION + mfspr r3,SPRN_PTCR + std r3,_PTCR(r1) + /* + * Note - SDR1 is dropped in Power ISA v3. Hence not restoring + * SDR1 here + */ +FTR_SECTION_ELSE + mfspr r3,SPRN_SDR1 + std r3,_SDR1(r1) +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) + mfspr r3,SPRN_RPR + std r3,_RPR(r1) + mfspr r3,SPRN_SPURR + std r3,_SPURR(r1) + mfspr r3,SPRN_PURR + std r3,_PURR(r1) + mfspr r3,SPRN_TSCR + std r3,_TSCR(r1) + mfspr r3,SPRN_DSCR + std r3,_DSCR(r1) + mfspr r3,SPRN_AMOR + std r3,_AMOR(r1) + mfspr r3,SPRN_WORT + std r3,_WORT(r1) + mfspr r3,SPRN_WORC + std r3,_WORC(r1) + + blr + +/* * Used by threads when the lock bit of core_idle_state is set. * Threads will spin in HMT_LOW until the lock bit is cleared. * r14 - pointer to core_idle_state @@ -69,13 +115,16 @@ core_idle_lock_held: /* * Pass requested state in r3: - * r3 - PNV_THREAD_NAP/SLEEP/WINKLE + * r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8 + * - Requested STOP state in POWER9 * * To check IRQ_HAPPENED in r4 * 0 - don't check * 1 - check + * + * Address to 'rfid' to in r5 */ -_GLOBAL(power7_powersave_common) +_GLOBAL(pnv_powersave_common) /* Use r3 to pass state nap/sleep/winkle */ /* NAP is a state loss, we create a regs frame on the * stack, fill it up with the state we care about and @@ -126,28 +175,28 @@ _GLOBAL(power7_powersave_common) std r9,_MSR(r1) std r1,PACAR1(r13) +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + /* Tell KVM we're entering idle */ + li r4,KVM_HWTHREAD_IN_IDLE + stb r4,HSTATE_HWTHREAD_STATE(r13) +#endif + /* * Go to real mode to do the nap, as required by the architecture. * Also, we need to be in real mode before setting hwthread_state, * because as soon as we do that, another thread can switch * the MMU context to the guest. */ - LOAD_REG_IMMEDIATE(r5, MSR_IDLE) + LOAD_REG_IMMEDIATE(r7, MSR_IDLE) li r6, MSR_RI andc r6, r9, r6 - LOAD_REG_ADDR(r7, power7_enter_nap_mode) mtmsrd r6, 1 /* clear RI before setting SRR0/1 */ - mtspr SPRN_SRR0, r7 - mtspr SPRN_SRR1, r5 + mtspr SPRN_SRR0, r5 + mtspr SPRN_SRR1, r7 rfid - .globl power7_enter_nap_mode -power7_enter_nap_mode: -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE - /* Tell KVM we're napping */ - li r4,KVM_HWTHREAD_IN_NAP - stb r4,HSTATE_HWTHREAD_STATE(r13) -#endif + .globl pnv_enter_arch207_idle_mode +pnv_enter_arch207_idle_mode: stb r3,PACA_THREAD_IDLE_STATE(r13) cmpwi cr3,r3,PNV_THREAD_SLEEP bge cr3,2f @@ -196,8 +245,7 @@ fastsleep_workaround_at_entry: /* Fast sleep workaround */ li r3,1 li r4,1 - li r0,OPAL_CONFIG_CPU_IDLE_STATE - bl opal_call_realmode + bl opal_rm_config_cpu_idle_state /* Clear Lock bit */ li r0,0 @@ -206,30 +254,45 @@ fastsleep_workaround_at_entry: b common_enter enter_winkle: - /* - * Note all register i.e per-core, per-subcore or per-thread is saved - * here since any thread in the core might wake up first - */ - mfspr r3,SPRN_SDR1 - std r3,_SDR1(r1) - mfspr r3,SPRN_RPR - std r3,_RPR(r1) - mfspr r3,SPRN_SPURR - std r3,_SPURR(r1) - mfspr r3,SPRN_PURR - std r3,_PURR(r1) - mfspr r3,SPRN_TSCR - std r3,_TSCR(r1) - mfspr r3,SPRN_DSCR - std r3,_DSCR(r1) - mfspr r3,SPRN_AMOR - std r3,_AMOR(r1) - mfspr r3,SPRN_WORT - std r3,_WORT(r1) - mfspr r3,SPRN_WORC - std r3,_WORC(r1) + bl save_sprs_to_stack + IDLE_STATE_ENTER_SEQ(PPC_WINKLE) +/* + * r3 - requested stop state + */ +power_enter_stop: +/* + * Check if the requested state is a deep idle state. + */ + LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state) + ld r4,ADDROFF(pnv_first_deep_stop_state)(r5) + cmpd r3,r4 + bge 2f + IDLE_STATE_ENTER_SEQ(PPC_STOP) +2: +/* + * Entering deep idle state. + * Clear thread bit in PACA_CORE_IDLE_STATE, save SPRs to + * stack and enter stop + */ + lbz r7,PACA_THREAD_MASK(r13) + ld r14,PACA_CORE_IDLE_STATE_PTR(r13) + +lwarx_loop_stop: + lwarx r15,0,r14 + andi. r9,r15,PNV_CORE_IDLE_LOCK_BIT + bnel core_idle_lock_held + andc r15,r15,r7 /* Clear thread bit */ + + stwcx. r15,0,r14 + bne- lwarx_loop_stop + isync + + bl save_sprs_to_stack + + IDLE_STATE_ENTER_SEQ(PPC_STOP) + _GLOBAL(power7_idle) /* Now check if user or arch enabled NAP mode */ LOAD_REG_ADDRBASE(r3,powersave_nap) @@ -242,19 +305,22 @@ _GLOBAL(power7_idle) _GLOBAL(power7_nap) mr r4,r3 li r3,PNV_THREAD_NAP - b power7_powersave_common + LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode) + b pnv_powersave_common /* No return */ _GLOBAL(power7_sleep) li r3,PNV_THREAD_SLEEP li r4,1 - b power7_powersave_common + LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode) + b pnv_powersave_common /* No return */ _GLOBAL(power7_winkle) - li r3,3 + li r3,PNV_THREAD_WINKLE li r4,1 - b power7_powersave_common + LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode) + b pnv_powersave_common /* No return */ #define CHECK_HMI_INTERRUPT \ @@ -270,25 +336,104 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ ld r2,PACATOC(r13); \ ld r1,PACAR1(r13); \ std r3,ORIG_GPR3(r1); /* Save original r3 */ \ - li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ - bl opal_call_realmode; \ + bl opal_rm_handle_hmi; \ ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ 20: nop; -_GLOBAL(power7_wakeup_tb_loss) +/* + * r3 - requested stop state + */ +_GLOBAL(power9_idle_stop) + LOAD_REG_IMMEDIATE(r4, PSSCR_HV_TEMPLATE) + or r4,r4,r3 + mtspr SPRN_PSSCR, r4 + li r4, 1 + LOAD_REG_ADDR(r5,power_enter_stop) + b pnv_powersave_common + /* No return */ +/* + * Called from reset vector. Check whether we have woken up with + * hypervisor state loss. If yes, restore hypervisor state and return + * back to reset vector. + * + * r13 - Contents of HSPRG0 + * cr3 - set to gt if waking up with partial/complete hypervisor state loss + */ +_GLOBAL(pnv_restore_hyp_resource) ld r2,PACATOC(r13); +BEGIN_FTR_SECTION + /* + * POWER ISA 3. Use PSSCR to determine if we + * are waking up from deep idle state + */ + LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state) + ld r4,ADDROFF(pnv_first_deep_stop_state)(r5) + + mfspr r5,SPRN_PSSCR + /* + * 0-3 bits correspond to Power-Saving Level Status + * which indicates the idle state we are waking up from + */ + rldicl r5,r5,4,60 + cmpd cr4,r5,r4 + bge cr4,pnv_wakeup_tb_loss + /* + * Waking up without hypervisor state loss. Return to + * reset vector + */ + blr + +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) + + /* + * POWER ISA 2.07 or less. + * Check if last bit of HSPGR0 is set. This indicates whether we are + * waking up from winkle. + */ + clrldi r5,r13,63 + clrrdi r13,r13,1 + cmpwi cr4,r5,1 + mtspr SPRN_HSPRG0,r13 + + lbz r0,PACA_THREAD_IDLE_STATE(r13) + cmpwi cr2,r0,PNV_THREAD_NAP + bgt cr2,pnv_wakeup_tb_loss /* Either sleep or Winkle */ + + /* + * We fall through here if PACA_THREAD_IDLE_STATE shows we are waking + * up from nap. At this stage CR3 shouldn't contains 'gt' since that + * indicates we are waking with hypervisor state loss from nap. + */ + bgt cr3,. + + blr /* Return back to System Reset vector from where + pnv_restore_hyp_resource was invoked */ + +/* + * Called if waking up from idle state which can cause either partial or + * complete hyp state loss. + * In POWER8, called if waking up from fastsleep or winkle + * In POWER9, called if waking up from stop state >= pnv_first_deep_stop_state + * + * r13 - PACA + * cr3 - gt if waking up with partial/complete hypervisor state loss + * cr4 - eq if waking up from complete hypervisor state loss. + */ +_GLOBAL(pnv_wakeup_tb_loss) ld r1,PACAR1(r13) /* * Before entering any idle state, the NVGPRs are saved in the stack * and they are restored before switching to the process context. Hence * until they are restored, they are free to be used. * - * Save SRR1 in a NVGPR as it might be clobbered in opal_call_realmode - * (called in CHECK_HMI_INTERRUPT). SRR1 is required to determine the - * wakeup reason if we branch to kvm_start_guest. + * Save SRR1 and LR in NVGPRs as they might be clobbered in + * opal_call() (called in CHECK_HMI_INTERRUPT). SRR1 is required + * to determine the wakeup reason if we branch to kvm_start_guest. LR + * is required to return back to reset vector after hypervisor state + * restore is complete. */ - + mflr r17 mfspr r16,SPRN_SRR1 BEGIN_FTR_SECTION CHECK_HMI_INTERRUPT @@ -310,35 +455,35 @@ lwarx_loop2: bnel core_idle_lock_held cmpwi cr2,r15,0 - lbz r4,PACA_SUBCORE_SIBLING_MASK(r13) - and r4,r4,r15 - cmpwi cr1,r4,0 /* Check if first in subcore */ /* * At this stage - * cr1 - 0b0100 if first thread to wakeup in subcore - * cr2 - 0b0100 if first thread to wakeup in core - * cr3- 0b0010 if waking up from sleep or winkle - * cr4 - 0b0100 if waking up from winkle + * cr2 - eq if first thread to wakeup in core + * cr3- gt if waking up with partial/complete hypervisor state loss + * cr4 - eq if waking up from complete hypervisor state loss. */ - or r15,r15,r7 /* Set thread bit */ - - beq cr1,first_thread_in_subcore - - /* Not first thread in subcore to wake up */ - stwcx. r15,0,r14 - bne- lwarx_loop2 - isync - b common_exit - -first_thread_in_subcore: - /* First thread in subcore to wakeup */ ori r15,r15,PNV_CORE_IDLE_LOCK_BIT stwcx. r15,0,r14 bne- lwarx_loop2 isync +BEGIN_FTR_SECTION + lbz r4,PACA_SUBCORE_SIBLING_MASK(r13) + and r4,r4,r15 + cmpwi r4,0 /* Check if first in subcore */ + + or r15,r15,r7 /* Set thread bit */ + beq first_thread_in_subcore +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) + + or r15,r15,r7 /* Set thread bit */ + beq cr2,first_thread_in_core + + /* Not first thread in core or subcore to wake up */ + b clear_lock + +first_thread_in_subcore: /* * If waking up from sleep, subcore state is not lost. Hence * skip subcore state restore @@ -348,6 +493,7 @@ first_thread_in_subcore: /* Restore per-subcore state */ ld r4,_SDR1(r1) mtspr SPRN_SDR1,r4 + ld r4,_RPR(r1) mtspr SPRN_RPR,r4 ld r4,_AMOR(r1) @@ -363,32 +509,44 @@ subcore_state_restored: first_thread_in_core: /* - * First thread in the core waking up from fastsleep. It needs to + * First thread in the core waking up from any state which can cause + * partial or complete hypervisor state loss. It needs to * call the fastsleep workaround code if the platform requires it. * Call it unconditionally here. The below branch instruction will - * be patched out when the idle states are discovered if platform - * does not require workaround. + * be patched out if the platform does not have fastsleep or does not + * require the workaround. Patching will be performed during the + * discovery of idle-states. */ .global pnv_fastsleep_workaround_at_exit pnv_fastsleep_workaround_at_exit: b fastsleep_workaround_at_exit timebase_resync: - /* Do timebase resync if we are waking up from sleep. Use cr3 value - * set in exceptions-64s.S */ + /* + * Use cr3 which indicates that we are waking up with atleast partial + * hypervisor state loss to determine if TIMEBASE RESYNC is needed. + */ ble cr3,clear_lock /* Time base re-sync */ - li r0,OPAL_RESYNC_TIMEBASE - bl opal_call_realmode; - /* TODO: Check r3 for failure */ - + bl opal_rm_resync_timebase; /* * If waking up from sleep, per core state is not lost, skip to * clear_lock. */ bne cr4,clear_lock - /* Restore per core state */ + /* + * First thread in the core to wake up and its waking up with + * complete hypervisor state loss. Restore per core hypervisor + * state. + */ +BEGIN_FTR_SECTION + ld r4,_PTCR(r1) + mtspr SPRN_PTCR,r4 + ld r4,_RPR(r1) + mtspr SPRN_RPR,r4 +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) + ld r4,_TSCR(r1) mtspr SPRN_TSCR,r4 ld r4,_WORC(r1) @@ -410,9 +568,9 @@ common_exit: /* Waking up from winkle */ - /* Restore per thread state */ - bl __restore_cpu_power8 - +BEGIN_MMU_FTR_SECTION + b no_segments +END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) /* Restore SLB from PACA */ ld r8,PACA_SLBSHADOWPTR(r13) @@ -426,6 +584,9 @@ common_exit: slbmte r6,r5 1: addi r8,r8,16 .endr +no_segments: + + /* Restore per thread state */ ld r4,_SPURR(r1) mtspr SPRN_SPURR,r4 @@ -436,48 +597,34 @@ common_exit: ld r4,_WORT(r1) mtspr SPRN_WORT,r4 -hypervisor_state_restored: + /* Call cur_cpu_spec->cpu_restore() */ + LOAD_REG_ADDR(r4, cur_cpu_spec) + ld r4,0(r4) + ld r12,CPU_SPEC_RESTORE(r4) +#ifdef PPC64_ELF_ABI_v1 + ld r12,0(r12) +#endif + mtctr r12 + bctrl - li r5,PNV_THREAD_RUNNING - stb r5,PACA_THREAD_IDLE_STATE(r13) +hypervisor_state_restored: mtspr SPRN_SRR1,r16 -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE - li r0,KVM_HWTHREAD_IN_KERNEL - stb r0,HSTATE_HWTHREAD_STATE(r13) - /* Order setting hwthread_state vs. testing hwthread_req */ - sync - lbz r0,HSTATE_HWTHREAD_REQ(r13) - cmpwi r0,0 - beq 6f - b kvm_start_guest -6: -#endif - - REST_NVGPRS(r1) - REST_GPR(2, r1) - ld r3,_CCR(r1) - ld r4,_MSR(r1) - ld r5,_NIP(r1) - addi r1,r1,INT_FRAME_SIZE - mtcr r3 - mfspr r3,SPRN_SRR1 /* Return SRR1 */ - mtspr SPRN_SRR1,r4 - mtspr SPRN_SRR0,r5 - rfid + mtlr r17 + blr /* Return back to System Reset vector from where + pnv_restore_hyp_resource was invoked */ fastsleep_workaround_at_exit: li r3,1 li r4,0 - li r0,OPAL_CONFIG_CPU_IDLE_STATE - bl opal_call_realmode + bl opal_rm_config_cpu_idle_state b timebase_resync /* * R3 here contains the value that will be returned to the caller * of power7_nap. */ -_GLOBAL(power7_wakeup_loss) +_GLOBAL(pnv_wakeup_loss) ld r1,PACAR1(r13) BEGIN_FTR_SECTION CHECK_HMI_INTERRUPT @@ -497,10 +644,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) * R3 here contains the value that will be returned to the caller * of power7_nap. */ -_GLOBAL(power7_wakeup_noloss) +_GLOBAL(pnv_wakeup_noloss) lbz r0,PACA_NAPSTATELOST(r13) cmpwi r0,0 - bne power7_wakeup_loss + bne pnv_wakeup_loss BEGIN_FTR_SECTION CHECK_HMI_INTERRUPT END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 3cb46a3b1de7..ac910d9982df 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -250,7 +250,7 @@ notrace void arch_local_irq_restore(unsigned long en) if (WARN_ON(mfmsr() & MSR_EE)) __hard_irq_disable(); } -#endif /* CONFIG_TRACE_IRQFLAG */ +#endif /* CONFIG_TRACE_IRQFLAGS */ set_soft_enabled(0); @@ -342,6 +342,21 @@ bool prep_irq_for_idle(void) return true; } +/* + * Force a replay of the external interrupt handler on this CPU. + */ +void force_external_irq_replay(void) +{ + /* + * This must only be called with interrupts soft-disabled, + * the replay will happen when re-enabling. + */ + WARN_ON(!arch_irqs_disabled()); + + /* Indicate in the PACA that we have an interrupt to replay */ + local_paca->irq_happened |= PACA_IRQ_EE; +} + #endif /* CONFIG_PPC64 */ int arch_show_interrupts(struct seq_file *p, int prec) diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index 7c053f281406..3ed8ec09b5c9 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -278,12 +278,11 @@ no_kprobe: * - When the probed function returns, this probe * causes the handlers to fire */ -static void __used kretprobe_trampoline_holder(void) -{ - asm volatile(".global kretprobe_trampoline\n" - "kretprobe_trampoline:\n" - "nop\n"); -} +asm(".global kretprobe_trampoline\n" + ".type kretprobe_trampoline, @function\n" + "kretprobe_trampoline:\n" + "nop\n" + ".size kretprobe_trampoline, .-kretprobe_trampoline\n"); /* * Called when the probe at kretprobe trampoline is hit @@ -506,13 +505,11 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs) /* setup return addr to the jprobe handler routine */ regs->nip = arch_deref_entry_point(jp->entry); -#ifdef CONFIG_PPC64 -#if defined(_CALL_ELF) && _CALL_ELF == 2 +#ifdef PPC64_ELF_ABI_v2 regs->gpr[12] = (unsigned long)jp->entry; -#else +#elif defined(PPC64_ELF_ABI_v1) regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc); #endif -#endif return 1; } diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index b8c202d63ecb..4c780a342282 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -29,6 +29,7 @@ #include <asm/prom.h> #include <asm/smp.h> #include <asm/hw_breakpoint.h> +#include <asm/asm-prototypes.h> #ifdef CONFIG_PPC_BOOK3E int default_machine_kexec_prepare(struct kimage *image) @@ -54,7 +55,7 @@ int default_machine_kexec_prepare(struct kimage *image) const unsigned long *basep; const unsigned int *sizep; - if (!ppc_md.hpte_clear_all) + if (!mmu_hash_ops.hpte_clear_all) return -ENOENT; /* @@ -379,7 +380,12 @@ void default_machine_kexec(struct kimage *image) */ kexec_sequence(&kexec_stack, image->start, image, page_address(image->control_code_page), - ppc_md.hpte_clear_all); +#ifdef CONFIG_PPC_STD_MMU + mmu_hash_ops.hpte_clear_all +#else + NULL +#endif + ); /* NOTREACHED */ } diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 285ca8c6cc2e..d9c912b6e632 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -104,20 +104,6 @@ _GLOBAL(mulhdu) blr /* - * sub_reloc_offset(x) returns x - reloc_offset(). - */ -_GLOBAL(sub_reloc_offset) - mflr r0 - bl 1f -1: mflr r5 - lis r4,1b@ha - addi r4,r4,1b@l - subf r5,r4,r5 - subf r3,r5,r3 - mtlr r0 - blr - -/* * reloc_got2 runs through the .got2 section adding an offset * to each entry. */ diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index f28754c497e5..cb195157b318 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -661,13 +661,13 @@ _GLOBAL(kexec_sequence) #ifndef CONFIG_PPC_BOOK3E /* clear out hardware hash page table and tlb */ -#if !defined(_CALL_ELF) || _CALL_ELF != 2 +#ifdef PPC64_ELF_ABI_v1 ld r12,0(r27) /* deref function descriptor */ #else mr r12,r27 #endif mtctr r12 - bctrl /* ppc_md.hpte_clear_all(void); */ + bctrl /* mmu_hash_ops.hpte_clear_all(void); */ #endif /* !CONFIG_PPC_BOOK3E */ /* diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 9ce9a25f58b5..183368e008cf 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -41,7 +41,7 @@ this, and makes other things simpler. Anton? --RR. */ -#if defined(_CALL_ELF) && _CALL_ELF == 2 +#ifdef PPC64_ELF_ABI_v2 /* An address is simply the address of the function. */ typedef unsigned long func_desc_t; @@ -132,7 +132,7 @@ static u32 ppc64_stub_insns[] = { /* Save current r2 value in magic place on the stack. */ 0xf8410000|R2_STACK_OFFSET, /* std r2,R2_STACK_OFFSET(r1) */ 0xe98b0020, /* ld r12,32(r11) */ -#if !defined(_CALL_ELF) || _CALL_ELF != 2 +#ifdef PPC64_ELF_ABI_v1 /* Set up new r2 from function descriptor */ 0xe84b0028, /* ld r2,40(r11) */ #endif @@ -494,9 +494,10 @@ static bool is_early_mcount_callsite(u32 *instruction) restore r2. */ static int restore_r2(u32 *instruction, struct module *me) { + if (is_early_mcount_callsite(instruction - 1)) + return 1; + if (*instruction != PPC_INST_NOP) { - if (is_early_mcount_callsite(instruction - 1)) - return 1; pr_err("%s: Expect noop after relocate, got %08x\n", me->name, *instruction); return 0; diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 0f7a60f1e9f6..f93942b4b6a6 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -41,11 +41,18 @@ #include <asm/ppc-pci.h> #include <asm/eeh.h> +/* hose_spinlock protects accesses to the the phb_bitmap. */ static DEFINE_SPINLOCK(hose_spinlock); LIST_HEAD(hose_list); -/* XXX kill that some day ... */ -static int global_phb_number; /* Global phb counter */ +/* For dynamic PHB numbering on get_phb_number(): max number of PHBs. */ +#define MAX_PHBS 0x10000 + +/* + * For dynamic PHB numbering: used/free PHBs tracking bitmap. + * Accesses to this bitmap should be protected by hose_spinlock. + */ +static DECLARE_BITMAP(phb_bitmap, MAX_PHBS); /* ISA Memory physical address */ resource_size_t isa_mem_base; @@ -64,6 +71,42 @@ struct dma_map_ops *get_pci_dma_ops(void) } EXPORT_SYMBOL(get_pci_dma_ops); +/* + * This function should run under locking protection, specifically + * hose_spinlock. + */ +static int get_phb_number(struct device_node *dn) +{ + int ret, phb_id = -1; + u64 prop; + + /* + * Try fixed PHB numbering first, by checking archs and reading + * the respective device-tree properties. Firstly, try powernv by + * reading "ibm,opal-phbid", only present in OPAL environment. + */ + ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop); + if (ret) + ret = of_property_read_u32_index(dn, "reg", 1, (u32 *)&prop); + + if (!ret) + phb_id = (int)(prop & (MAX_PHBS - 1)); + + /* We need to be sure to not use the same PHB number twice. */ + if ((phb_id >= 0) && !test_and_set_bit(phb_id, phb_bitmap)) + return phb_id; + + /* + * If not pseries nor powernv, or if fixed PHB numbering tried to add + * the same PHB number twice, then fallback to dynamic PHB numbering. + */ + phb_id = find_first_zero_bit(phb_bitmap, MAX_PHBS); + BUG_ON(phb_id >= MAX_PHBS); + set_bit(phb_id, phb_bitmap); + + return phb_id; +} + struct pci_controller *pcibios_alloc_controller(struct device_node *dev) { struct pci_controller *phb; @@ -72,7 +115,7 @@ struct pci_controller *pcibios_alloc_controller(struct device_node *dev) if (phb == NULL) return NULL; spin_lock(&hose_spinlock); - phb->global_number = global_phb_number++; + phb->global_number = get_phb_number(dev); list_add_tail(&phb->list_node, &hose_list); spin_unlock(&hose_spinlock); phb->dn = dev; @@ -94,6 +137,11 @@ EXPORT_SYMBOL_GPL(pcibios_alloc_controller); void pcibios_free_controller(struct pci_controller *phb) { spin_lock(&hose_spinlock); + + /* Clear bit of phb_bitmap to allow reuse of this PHB number. */ + if (phb->global_number < MAX_PHBS) + clear_bit(phb->global_number, phb_bitmap); + list_del(&phb->list_node); spin_unlock(&hose_spinlock); @@ -124,6 +172,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus, return 1; } +void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type) +{ + struct pci_controller *hose = pci_bus_to_host(bus); + + if (hose->controller_ops.setup_bridge) + hose->controller_ops.setup_bridge(bus, type); +} + void pcibios_reset_secondary_bus(struct pci_dev *dev) { struct pci_controller *phb = pci_bus_to_host(dev->bus); @@ -1362,8 +1418,10 @@ void __init pcibios_resource_survey(void) /* Allocate and assign resources */ list_for_each_entry(b, &pci_root_buses, node) pcibios_allocate_bus_resources(b); - pcibios_allocate_resources(0); - pcibios_allocate_resources(1); + if (!pci_has_flag(PCI_REASSIGN_ALL_RSRC)) { + pcibios_allocate_resources(0); + pcibios_allocate_resources(1); + } /* Before we start assigning unassigned resource, we try to reserve * the low IO area and the VGA memory area if they intersect the @@ -1436,8 +1494,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus) /* Allocate bus and devices resources */ pcibios_allocate_bus_resources(bus); pcibios_claim_one_bus(bus); - if (!pci_has_flag(PCI_PROBE_ONLY)) - pci_assign_unassigned_bus_resources(bus); + if (!pci_has_flag(PCI_PROBE_ONLY)) { + if (bus->self) + pci_assign_unassigned_bridge_resources(bus->self); + else + pci_assign_unassigned_bus_resources(bus); + } /* Fixup EEH */ eeh_add_device_tree_late(bus); @@ -1485,9 +1547,9 @@ static void pcibios_setup_phb_resources(struct pci_controller *hose, res = &hose->io_resource; if (!res->flags) { - pr_info("PCI: I/O resource not set for host" - " bridge %s (domain %d)\n", - hose->dn->full_name, hose->global_number); + pr_debug("PCI: I/O resource not set for host" + " bridge %s (domain %d)\n", + hose->dn->full_name, hose->global_number); } else { offset = pcibios_io_space_offset(hose); diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index a5ae49a2dcc4..ed5e9ff61a68 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -81,7 +81,7 @@ int pcibios_unmap_io_space(struct pci_bus *bus) /* If this is not a PHB, we only flush the hash table over * the area mapped by this bridge. We don't play with the PTE - * mappings since we might have to deal with sub-page alignemnts + * mappings since we might have to deal with sub-page alignments * so flushing the hash table is the only sane way to make sure * that no hash entries are covering that removed bridge area * while still allowing other busses overlapping those pages diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c index ecdccce78719..592693437070 100644 --- a/arch/powerpc/kernel/pci_dn.c +++ b/arch/powerpc/kernel/pci_dn.c @@ -31,6 +31,7 @@ #include <asm/pci-bridge.h> #include <asm/ppc-pci.h> #include <asm/firmware.h> +#include <asm/eeh.h> /* * The function is used to find the firmware data of one @@ -181,7 +182,6 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev) { #ifdef CONFIG_PCI_IOV struct pci_dn *parent, *pdn; - struct eeh_dev *edev; int i; /* Only support IOV for now */ @@ -199,6 +199,8 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev) return NULL; for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) { + struct eeh_dev *edev __maybe_unused; + pdn = add_one_dev_pci_data(parent, NULL, i, pci_iov_virtfn_bus(pdev, i), pci_iov_virtfn_devfn(pdev, i)); @@ -208,11 +210,12 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev) return NULL; } +#ifdef CONFIG_EEH /* Create the EEH device for the VF */ - eeh_dev_init(pdn, pci_bus_to_host(pdev->bus)); - edev = pdn_to_eeh_dev(pdn); + edev = eeh_dev_init(pdn); BUG_ON(!edev); edev->physfn = pdev; +#endif /* CONFIG_EEH */ } #endif /* CONFIG_PCI_IOV */ @@ -224,7 +227,6 @@ void remove_dev_pci_data(struct pci_dev *pdev) #ifdef CONFIG_PCI_IOV struct pci_dn *parent; struct pci_dn *pdn, *tmp; - struct eeh_dev *edev; int i; /* @@ -260,18 +262,22 @@ void remove_dev_pci_data(struct pci_dev *pdev) * a batch mode. */ for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) { + struct eeh_dev *edev __maybe_unused; + list_for_each_entry_safe(pdn, tmp, &parent->child_list, list) { if (pdn->busno != pci_iov_virtfn_bus(pdev, i) || pdn->devfn != pci_iov_virtfn_devfn(pdev, i)) continue; +#ifdef CONFIG_EEH /* Release EEH device for the VF */ edev = pdn_to_eeh_dev(pdn); if (edev) { pdn->edev = NULL; kfree(edev); } +#endif /* CONFIG_EEH */ if (!list_empty(&pdn->list)) list_del(&pdn->list); @@ -289,8 +295,11 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose, const __be32 *regs; struct device_node *parent; struct pci_dn *pdn; +#ifdef CONFIG_EEH + struct eeh_dev *edev; +#endif - pdn = zalloc_maybe_bootmem(sizeof(*pdn), GFP_KERNEL); + pdn = kzalloc(sizeof(*pdn), GFP_KERNEL); if (pdn == NULL) return NULL; dn->data = pdn; @@ -319,6 +328,15 @@ struct pci_dn *pci_add_device_node_info(struct pci_controller *hose, /* Extended config space */ pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1); + /* Create EEH device */ +#ifdef CONFIG_EEH + edev = eeh_dev_init(pdn); + if (!edev) { + kfree(pdn); + return NULL; + } +#endif + /* Attach to parent node */ INIT_LIST_HEAD(&pdn->child_list); INIT_LIST_HEAD(&pdn->list); @@ -504,15 +522,19 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb) * pci device found underneath. This routine runs once, * early in the boot sequence. */ -void __init pci_devs_phb_init(void) +static int __init pci_devs_phb_init(void) { struct pci_controller *phb, *tmp; /* This must be done first so the device nodes have valid pci info! */ list_for_each_entry_safe(phb, tmp, &hose_list, list_node) pci_devs_phb_init_dynamic(phb); + + return 0; } +core_initcall(pci_devs_phb_init); + static void pci_dev_pdn_setup(struct pci_dev *pdev) { struct pci_dn *pdn; diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 0b93893424f5..a8cca88e972f 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -139,12 +139,16 @@ EXPORT_SYMBOL(__msr_check_and_clear); #ifdef CONFIG_PPC_FPU void __giveup_fpu(struct task_struct *tsk) { + unsigned long msr; + save_fpu(tsk); - tsk->thread.regs->msr &= ~MSR_FP; + msr = tsk->thread.regs->msr; + msr &= ~MSR_FP; #ifdef CONFIG_VSX if (cpu_has_feature(CPU_FTR_VSX)) - tsk->thread.regs->msr &= ~MSR_VSX; + msr &= ~MSR_VSX; #endif + tsk->thread.regs->msr = msr; } void giveup_fpu(struct task_struct *tsk) @@ -219,12 +223,16 @@ static int restore_fp(struct task_struct *tsk) { return 0; } static void __giveup_altivec(struct task_struct *tsk) { + unsigned long msr; + save_altivec(tsk); - tsk->thread.regs->msr &= ~MSR_VEC; + msr = tsk->thread.regs->msr; + msr &= ~MSR_VEC; #ifdef CONFIG_VSX if (cpu_has_feature(CPU_FTR_VSX)) - tsk->thread.regs->msr &= ~MSR_VSX; + msr &= ~MSR_VSX; #endif + tsk->thread.regs->msr = msr; } void giveup_altivec(struct task_struct *tsk) @@ -794,7 +802,7 @@ static void tm_reclaim_thread(struct thread_struct *thr, * this state. * We do this using the current MSR, rather tracking it in * some specific thread_struct bit, as it has the additional - * benifit of checking for a potential TM bad thing exception. + * benefit of checking for a potential TM bad thing exception. */ if (!MSR_TM_SUSPENDED(mfmsr())) return; @@ -1009,6 +1017,14 @@ static inline void save_sprs(struct thread_struct *t) */ t->tar = mfspr(SPRN_TAR); } + + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + /* Conditionally save Load Monitor registers, if enabled */ + if (t->fscr & FSCR_LM) { + t->lmrr = mfspr(SPRN_LMRR); + t->lmser = mfspr(SPRN_LMSER); + } + } #endif } @@ -1023,18 +1039,11 @@ static inline void restore_sprs(struct thread_struct *old_thread, #ifdef CONFIG_PPC_BOOK3S_64 if (cpu_has_feature(CPU_FTR_DSCR)) { u64 dscr = get_paca()->dscr_default; - u64 fscr = old_thread->fscr & ~FSCR_DSCR; - - if (new_thread->dscr_inherit) { + if (new_thread->dscr_inherit) dscr = new_thread->dscr; - fscr |= FSCR_DSCR; - } if (old_thread->dscr != dscr) mtspr(SPRN_DSCR, dscr); - - if (old_thread->fscr != fscr) - mtspr(SPRN_FSCR, fscr); } if (cpu_has_feature(CPU_FTR_ARCH_207S)) { @@ -1045,9 +1054,22 @@ static inline void restore_sprs(struct thread_struct *old_thread, if (old_thread->ebbrr != new_thread->ebbrr) mtspr(SPRN_EBBRR, new_thread->ebbrr); + if (old_thread->fscr != new_thread->fscr) + mtspr(SPRN_FSCR, new_thread->fscr); + if (old_thread->tar != new_thread->tar) mtspr(SPRN_TAR, new_thread->tar); } + + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + /* Conditionally restore Load Monitor registers, if enabled */ + if (new_thread->fscr & FSCR_LM) { + if (old_thread->lmrr != new_thread->lmrr) + mtspr(SPRN_LMRR, new_thread->lmrr); + if (old_thread->lmser != new_thread->lmser) + mtspr(SPRN_LMSER, new_thread->lmser); + } + } #endif } diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 946e34ffeae9..bae3db791150 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -56,6 +56,8 @@ #include <asm/opal.h> #include <asm/fadump.h> #include <asm/debug.h> +#include <asm/epapr_hcalls.h> +#include <asm/firmware.h> #include <mm/mmu_decl.h> @@ -645,6 +647,14 @@ static void __init early_reserve_mem(void) #endif } +static bool disable_radix; +static int __init parse_disable_radix(char *p) +{ + disable_radix = true; + return 0; +} +early_param("disable_radix", parse_disable_radix); + void __init early_init_devtree(void *params) { phys_addr_t limit; @@ -734,11 +744,26 @@ void __init early_init_devtree(void *params) */ spinning_secondaries = boot_cpu_count - 1; #endif + /* + * now fixup radix MMU mode based on kernel command line + */ + if (disable_radix) + cur_cpu_spec->mmu_features &= ~MMU_FTR_RADIX; #ifdef CONFIG_PPC_POWERNV /* Scan and build the list of machine check recoverable ranges */ of_scan_flat_dt(early_init_dt_scan_recoverable_ranges, NULL); #endif + epapr_paravirt_early_init(); + + /* Now try to figure out if we are running on LPAR and so on */ + pseries_probe_fw_features(); + +#ifdef CONFIG_PPC_PS3 + /* Identify PS3 firmware */ + if (of_flat_dt_is_compatible(of_get_flat_dt_root(), "sony,ps3")) + powerpc_firmware_features |= FW_FEATURE_PS3_POSSIBLE; +#endif DBG(" <- early_init_devtree()\n"); } diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index fb2fb3ea85e5..c82eed97bd22 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -698,7 +698,7 @@ static void check_location(struct seq_file *m, const char *c) /* * Format: * ${LETTER}${NUMBER}[[-/]${LETTER}${NUMBER} [ ... ] ] - * the '.' may be an abbrevation + * the '.' may be an abbreviation */ static void check_location_string(struct seq_file *m, const char *c) { diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 28736ff27fea..6a3e5de544ce 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -685,7 +685,7 @@ int rtas_set_indicator_fast(int indicator, int index, int new_value) return rc; } -void rtas_restart(char *cmd) +void __noreturn rtas_restart(char *cmd) { if (rtas_flash_term_hook) rtas_flash_term_hook(SYS_RESTART); @@ -704,7 +704,7 @@ void rtas_power_off(void) for (;;); } -void rtas_halt(void) +void __noreturn rtas_halt(void) { if (rtas_flash_term_hook) rtas_flash_term_hook(SYS_HALT); @@ -1070,7 +1070,7 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs) nret = be32_to_cpu(args.nret); token = be32_to_cpu(args.token); - if (nargs > ARRAY_SIZE(args.args) + if (nargs >= ARRAY_SIZE(args.args) || nret > ARRAY_SIZE(args.args) || nargs + nret > ARRAY_SIZE(args.args)) return -EINVAL; @@ -1174,7 +1174,7 @@ void __init rtas_initialize(void) * the stop-self token if any */ #ifdef CONFIG_PPC64 - if (machine_is(pseries) && firmware_has_feature(FW_FEATURE_LPAR)) { + if (firmware_has_feature(FW_FEATURE_LPAR)) { rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX); ibm_suspend_me_token = rtas_token("ibm,suspend-me"); } diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c index c638e2487a9c..a26a02006576 100644 --- a/arch/powerpc/kernel/rtasd.c +++ b/arch/powerpc/kernel/rtasd.c @@ -483,7 +483,7 @@ static void rtas_event_scan(struct work_struct *w) } #ifdef CONFIG_PPC64 -static void retreive_nvram_error_log(void) +static void retrieve_nvram_error_log(void) { unsigned int err_type ; int rc ; @@ -501,7 +501,7 @@ static void retreive_nvram_error_log(void) } } #else /* CONFIG_PPC64 */ -static void retreive_nvram_error_log(void) +static void retrieve_nvram_error_log(void) { } #endif /* CONFIG_PPC64 */ @@ -513,7 +513,7 @@ static void start_event_scan(void) (30000 / rtas_event_scan_rate)); /* Retrieve errors from nvram if any */ - retreive_nvram_error_log(); + retrieve_nvram_error_log(); schedule_delayed_work_on(cpumask_first(cpu_online_mask), &event_scan_work, event_scan_delay); @@ -526,10 +526,8 @@ void rtas_cancel_event_scan(void) } EXPORT_SYMBOL_GPL(rtas_cancel_event_scan); -static int __init rtas_init(void) +static int __init rtas_event_scan_init(void) { - struct proc_dir_entry *entry; - if (!machine_is(pseries) && !machine_is(chrp)) return 0; @@ -562,13 +560,27 @@ static int __init rtas_init(void) return -ENOMEM; } + start_event_scan(); + + return 0; +} +arch_initcall(rtas_event_scan_init); + +static int __init rtas_init(void) +{ + struct proc_dir_entry *entry; + + if (!machine_is(pseries) && !machine_is(chrp)) + return 0; + + if (!rtas_log_buf) + return -ENODEV; + entry = proc_create("powerpc/rtas/error_log", S_IRUSR, NULL, &proc_rtas_log_operations); if (!entry) printk(KERN_ERR "Failed to create error_log proc entry\n"); - start_event_scan(); - return 0; } __initcall(rtas_init); diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 8ca79b7503d8..714b4ba7ab86 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -35,6 +35,7 @@ #include <linux/percpu.h> #include <linux/memblock.h> #include <linux/of_platform.h> +#include <linux/hugetlb.h> #include <asm/io.h> #include <asm/paca.h> #include <asm/prom.h> @@ -61,6 +62,12 @@ #include <asm/cputhreads.h> #include <mm/mmu_decl.h> #include <asm/fadump.h> +#include <asm/udbg.h> +#include <asm/hugetlb.h> +#include <asm/livepatch.h> +#include <asm/mmu_context.h> + +#include "setup.h" #ifdef DEBUG #include <asm/udbg.h> @@ -494,7 +501,7 @@ void __init smp_setup_cpu_maps(void) * On pSeries LPAR, we need to know how many cpus * could possibly be added to this partition. */ - if (machine_is(pseries) && firmware_has_feature(FW_FEATURE_LPAR) && + if (firmware_has_feature(FW_FEATURE_LPAR) && (dn = of_find_node_by_path("/rtas"))) { int num_addr_cell, num_size_cell, maxcpus; const __be32 *ireg; @@ -575,6 +582,7 @@ void probe_machine(void) { extern struct machdep_calls __machine_desc_start; extern struct machdep_calls __machine_desc_end; + unsigned int i; /* * Iterate all ppc_md structures until we find the proper @@ -582,6 +590,17 @@ void probe_machine(void) */ DBG("Probing machine type ...\n"); + /* + * Check ppc_md is empty, if not we have a bug, ie, we setup an + * entry before probe_machine() which will be overwritten + */ + for (i = 0; i < (sizeof(ppc_md) / sizeof(void *)); i++) { + if (((void **)&ppc_md)[i]) { + printk(KERN_ERR "Entry %d in ppc_md non empty before" + " machine probe !\n", i); + } + } + for (machine_id = &__machine_desc_start; machine_id < &__machine_desc_end; machine_id++) { @@ -676,6 +695,8 @@ static struct notifier_block ppc_panic_block = { void __init setup_panic(void) { + if (!ppc_md.panic) + return; atomic_notifier_chain_register(&panic_notifier_list, &ppc_panic_block); } @@ -744,3 +765,169 @@ void arch_setup_pdev_archdata(struct platform_device *pdev) pdev->dev.dma_mask = &pdev->archdata.dma_mask; set_dma_ops(&pdev->dev, &dma_direct_ops); } + +static __init void print_system_info(void) +{ + pr_info("-----------------------------------------------------\n"); +#ifdef CONFIG_PPC_STD_MMU_64 + pr_info("ppc64_pft_size = 0x%llx\n", ppc64_pft_size); +#endif +#ifdef CONFIG_PPC_STD_MMU_32 + pr_info("Hash_size = 0x%lx\n", Hash_size); +#endif + pr_info("phys_mem_size = 0x%llx\n", + (unsigned long long)memblock_phys_mem_size()); + + pr_info("dcache_bsize = 0x%x\n", dcache_bsize); + pr_info("icache_bsize = 0x%x\n", icache_bsize); + if (ucache_bsize != 0) + pr_info("ucache_bsize = 0x%x\n", ucache_bsize); + + pr_info("cpu_features = 0x%016lx\n", cur_cpu_spec->cpu_features); + pr_info(" possible = 0x%016lx\n", + (unsigned long)CPU_FTRS_POSSIBLE); + pr_info(" always = 0x%016lx\n", + (unsigned long)CPU_FTRS_ALWAYS); + pr_info("cpu_user_features = 0x%08x 0x%08x\n", + cur_cpu_spec->cpu_user_features, + cur_cpu_spec->cpu_user_features2); + pr_info("mmu_features = 0x%08x\n", cur_cpu_spec->mmu_features); +#ifdef CONFIG_PPC64 + pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features); +#endif + +#ifdef CONFIG_PPC_STD_MMU_64 + if (htab_address) + pr_info("htab_address = 0x%p\n", htab_address); + if (htab_hash_mask) + pr_info("htab_hash_mask = 0x%lx\n", htab_hash_mask); +#endif +#ifdef CONFIG_PPC_STD_MMU_32 + if (Hash) + pr_info("Hash = 0x%p\n", Hash); + if (Hash_mask) + pr_info("Hash_mask = 0x%lx\n", Hash_mask); +#endif + + if (PHYSICAL_START > 0) + pr_info("physical_start = 0x%llx\n", + (unsigned long long)PHYSICAL_START); + pr_info("-----------------------------------------------------\n"); +} + +/* + * Called into from start_kernel this initializes memblock, which is used + * to manage page allocation until mem_init is called. + */ +void __init setup_arch(char **cmdline_p) +{ + *cmdline_p = boot_command_line; + + /* Set a half-reasonable default so udelay does something sensible */ + loops_per_jiffy = 500000000 / HZ; + + /* Unflatten the device-tree passed by prom_init or kexec */ + unflatten_device_tree(); + + /* + * Initialize cache line/block info from device-tree (on ppc64) or + * just cputable (on ppc32). + */ + initialize_cache_info(); + + /* Initialize RTAS if available. */ + rtas_initialize(); + + /* Check if we have an initrd provided via the device-tree. */ + check_for_initrd(); + + /* Probe the machine type, establish ppc_md. */ + probe_machine(); + + /* Setup panic notifier if requested by the platform. */ + setup_panic(); + + /* + * Configure ppc_md.power_save (ppc32 only, 64-bit machines do + * it from their respective probe() function. + */ + setup_power_save(); + + /* Discover standard serial ports. */ + find_legacy_serial_ports(); + + /* Register early console with the printk subsystem. */ + register_early_udbg_console(); + + /* Setup the various CPU maps based on the device-tree. */ + smp_setup_cpu_maps(); + + /* Initialize xmon. */ + xmon_setup(); + + /* Check the SMT related command line arguments (ppc64). */ + check_smt_enabled(); + + /* On BookE, setup per-core TLB data structures. */ + setup_tlb_core_data(); + + /* + * Release secondary cpus out of their spinloops at 0x60 now that + * we can map physical -> logical CPU ids. + * + * Freescale Book3e parts spin in a loop provided by firmware, + * so smp_release_cpus() does nothing for them. + */ +#ifdef CONFIG_SMP + smp_release_cpus(); +#endif + + /* Print various info about the machine that has been gathered so far. */ + print_system_info(); + + /* Reserve large chunks of memory for use by CMA for KVM. */ + kvm_cma_reserve(); + + /* + * Reserve any gigantic pages requested on the command line. + * memblock needs to have been initialized by the time this is + * called since this will reserve memory. + */ + reserve_hugetlb_gpages(); + + klp_init_thread_info(&init_thread_info); + + init_mm.start_code = (unsigned long)_stext; + init_mm.end_code = (unsigned long) _etext; + init_mm.end_data = (unsigned long) _edata; + init_mm.brk = klimit; +#ifdef CONFIG_PPC_64K_PAGES + init_mm.context.pte_frag = NULL; +#endif +#ifdef CONFIG_SPAPR_TCE_IOMMU + mm_iommu_init(&init_mm.context); +#endif + irqstack_early_init(); + exc_lvl_early_init(); + emergency_stack_init(); + + initmem_init(); + +#ifdef CONFIG_DUMMY_CONSOLE + conswitchp = &dummy_con; +#endif + if (ppc_md.setup_arch) + ppc_md.setup_arch(); + + paging_init(); + + /* Initialize the MMU context management stuff. */ + mmu_context_init(); + +#ifdef CONFIG_PPC64 + /* Interrupt code needs to be 64K-aligned. */ + if ((unsigned long)_stext & 0xffff) + panic("Kernelbase not 64K-aligned (0x%lx)!\n", + (unsigned long)_stext); +#endif +} diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h new file mode 100644 index 000000000000..cfba134b3024 --- /dev/null +++ b/arch/powerpc/kernel/setup.h @@ -0,0 +1,58 @@ +/* + * Prototypes for functions that are shared between setup_(32|64|common).c + * + * Copyright 2016 Michael Ellerman, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef __ARCH_POWERPC_KERNEL_SETUP_H +#define __ARCH_POWERPC_KERNEL_SETUP_H + +void initialize_cache_info(void); +void irqstack_early_init(void); + +#ifdef CONFIG_PPC32 +void setup_power_save(void); +#else +static inline void setup_power_save(void) { }; +#endif + +#if defined(CONFIG_PPC64) && defined(CONFIG_SMP) +void check_smt_enabled(void); +#else +static inline void check_smt_enabled(void) { }; +#endif + +#if defined(CONFIG_PPC_BOOK3E) && defined(CONFIG_SMP) +void setup_tlb_core_data(void); +#else +static inline void setup_tlb_core_data(void) { }; +#endif + +#if defined(CONFIG_PPC_BOOK3E) || defined(CONFIG_BOOKE) || defined(CONFIG_40x) +void exc_lvl_early_init(void); +#else +static inline void exc_lvl_early_init(void) { }; +#endif + +#ifdef CONFIG_PPC64 +void emergency_stack_init(void); +#else +static inline void emergency_stack_init(void) { }; +#endif + +/* + * Having this in kvm_ppc.h makes include dependencies too + * tricky to solve for setup-common.c so have it here. + */ +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE +void kvm_cma_reserve(void); +#else +static inline void kvm_cma_reserve(void) { }; +#endif + +#endif /* __ARCH_POWERPC_KERNEL_SETUP_H */ diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index d544fa311757..00f57754407e 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -36,8 +36,6 @@ #include <asm/time.h> #include <asm/serial.h> #include <asm/udbg.h> -#include <asm/mmu_context.h> -#include <asm/epapr_hcalls.h> #include <asm/code-patching.h> #define DBG(fmt...) @@ -62,9 +60,7 @@ int icache_bsize; int ucache_bsize; /* - * We're called here very early in the boot. We determine the machine - * type and call the appropriate low-level setup functions. - * -- Cort <cort@fsmlabs.com> + * We're called here very early in the boot. * * Note that the kernel may be running at an address which is different * from the address that it was linked at, so we must use RELOC/PTRRELOC @@ -73,7 +69,6 @@ int ucache_bsize; notrace unsigned long __init early_init(unsigned long dt_ptr) { unsigned long offset = reloc_offset(); - struct cpu_spec *spec; /* First zero the BSS -- use memset_io, some platforms don't have * caches on yet */ @@ -84,27 +79,19 @@ notrace unsigned long __init early_init(unsigned long dt_ptr) * Identify the CPU type and fix up code sections * that depend on which cpu we have. */ - spec = identify_cpu(offset, mfspr(SPRN_PVR)); + identify_cpu(offset, mfspr(SPRN_PVR)); - do_feature_fixups(spec->cpu_features, - PTRRELOC(&__start___ftr_fixup), - PTRRELOC(&__stop___ftr_fixup)); - - do_feature_fixups(spec->mmu_features, - PTRRELOC(&__start___mmu_ftr_fixup), - PTRRELOC(&__stop___mmu_ftr_fixup)); - - do_lwsync_fixups(spec->cpu_features, - PTRRELOC(&__start___lwsync_fixup), - PTRRELOC(&__stop___lwsync_fixup)); - - do_final_fixups(); + apply_feature_fixups(); return KERNELBASE + offset; } /* + * This is run before start_kernel(), the kernel has been relocated + * and we are running with enough of the MMU enabled to have our + * proper kernel virtual addresses + * * Find out what kind of machine we're on and save any data we need * from the early boot process (devtree is copied on pmac by prom_init()). * This is called very early on the boot process, after a minimal @@ -123,27 +110,9 @@ notrace void __init machine_init(u64 dt_ptr) /* Do some early initialization based on the flat device tree */ early_init_devtree(__va(dt_ptr)); - epapr_paravirt_early_init(); - early_init_mmu(); - probe_machine(); - setup_kdump_trampoline(); - -#ifdef CONFIG_6xx - if (cpu_has_feature(CPU_FTR_CAN_DOZE) || - cpu_has_feature(CPU_FTR_CAN_NAP)) - ppc_md.power_save = ppc6xx_idle; -#endif - -#ifdef CONFIG_E500 - if (cpu_has_feature(CPU_FTR_CAN_DOZE) || - cpu_has_feature(CPU_FTR_CAN_NAP)) - ppc_md.power_save = e500_idle; -#endif - if (ppc_md.progress) - ppc_md.progress("id mach(): done", 0x200); } /* Checks "l2cr=xxxx" command-line option */ @@ -221,7 +190,7 @@ int __init ppc_init(void) arch_initcall(ppc_init); -static void __init irqstack_early_init(void) +void __init irqstack_early_init(void) { unsigned int i; @@ -236,7 +205,7 @@ static void __init irqstack_early_init(void) } #if defined(CONFIG_BOOKE) || defined(CONFIG_40x) -static void __init exc_lvl_early_init(void) +void __init exc_lvl_early_init(void) { unsigned int i, hw_cpu; @@ -259,33 +228,25 @@ static void __init exc_lvl_early_init(void) #endif } } -#else -#define exc_lvl_early_init() #endif -/* Warning, IO base is not yet inited */ -void __init setup_arch(char **cmdline_p) +void __init setup_power_save(void) { - *cmdline_p = boot_command_line; - - /* so udelay does something sensible, assume <= 1000 bogomips */ - loops_per_jiffy = 500000000 / HZ; - - unflatten_device_tree(); - check_for_initrd(); - - if (ppc_md.init_early) - ppc_md.init_early(); - - find_legacy_serial_ports(); - - smp_setup_cpu_maps(); - - /* Register early console */ - register_early_udbg_console(); +#ifdef CONFIG_6xx + if (cpu_has_feature(CPU_FTR_CAN_DOZE) || + cpu_has_feature(CPU_FTR_CAN_NAP)) + ppc_md.power_save = ppc6xx_idle; +#endif - xmon_setup(); +#ifdef CONFIG_E500 + if (cpu_has_feature(CPU_FTR_CAN_DOZE) || + cpu_has_feature(CPU_FTR_CAN_NAP)) + ppc_md.power_save = e500_idle; +#endif +} +__init void initialize_cache_info(void) +{ /* * Set cache line size based on type of cpu as a default. * Systems with OF can look in the properties on the cpu node(s) @@ -296,32 +257,4 @@ void __init setup_arch(char **cmdline_p) ucache_bsize = 0; if (cpu_has_feature(CPU_FTR_UNIFIED_ID_CACHE)) ucache_bsize = icache_bsize = dcache_bsize; - - if (ppc_md.panic) - setup_panic(); - - init_mm.start_code = (unsigned long)_stext; - init_mm.end_code = (unsigned long) _etext; - init_mm.end_data = (unsigned long) _edata; - init_mm.brk = klimit; - - exc_lvl_early_init(); - - irqstack_early_init(); - - initmem_init(); - if ( ppc_md.progress ) ppc_md.progress("setup_arch: initmem", 0x3eab); - -#ifdef CONFIG_DUMMY_CONSOLE - conswitchp = &dummy_con; -#endif - - if (ppc_md.setup_arch) - ppc_md.setup_arch(); - if ( ppc_md.progress ) ppc_md.progress("arch: exit", 0x3eab); - - paging_init(); - - /* Initialize the MMU context management stuff */ - mmu_context_init(); } diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 96d4a2b23d0f..d8216aed22b7 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -35,7 +35,6 @@ #include <linux/pci.h> #include <linux/lockdep.h> #include <linux/memblock.h> -#include <linux/hugetlb.h> #include <linux/memory.h> #include <linux/nmi.h> @@ -64,12 +63,10 @@ #include <asm/xmon.h> #include <asm/udbg.h> #include <asm/kexec.h> -#include <asm/mmu_context.h> #include <asm/code-patching.h> -#include <asm/kvm_ppc.h> -#include <asm/hugetlb.h> -#include <asm/epapr_hcalls.h> #include <asm/livepatch.h> +#include <asm/opal.h> +#include <asm/cputhreads.h> #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -100,7 +97,7 @@ int icache_bsize; int ucache_bsize; #if defined(CONFIG_PPC_BOOK3E) && defined(CONFIG_SMP) -static void setup_tlb_core_data(void) +void __init setup_tlb_core_data(void) { int cpu; @@ -133,10 +130,6 @@ static void setup_tlb_core_data(void) } } } -#else -static void setup_tlb_core_data(void) -{ -} #endif #ifdef CONFIG_SMP @@ -144,7 +137,7 @@ static void setup_tlb_core_data(void) static char *smt_enabled_cmdline; /* Look for ibm,smt-enabled OF option */ -static void check_smt_enabled(void) +void __init check_smt_enabled(void) { struct device_node *dn; const char *smt_option; @@ -193,12 +186,10 @@ static int __init early_smt_enabled(char *p) } early_param("smt-enabled", early_smt_enabled); -#else -#define check_smt_enabled() #endif /* CONFIG_SMP */ /** Fix up paca fields required for the boot cpu */ -static void fixup_boot_paca(void) +static void __init fixup_boot_paca(void) { /* The boot cpu is started */ get_paca()->cpu_start = 1; @@ -206,23 +197,50 @@ static void fixup_boot_paca(void) get_paca()->data_offset = 0; } -static void cpu_ready_for_interrupts(void) +static void __init configure_exceptions(void) { - /* Set IR and DR in PACA MSR */ - get_paca()->kernel_msr = MSR_KERNEL; - /* - * Enable AIL if supported, and we are in hypervisor mode. If we are - * not in hypervisor mode, we enable relocation-on interrupts later - * in pSeries_setup_arch() using the H_SET_MODE hcall. + * Setup the trampolines from the lowmem exception vectors + * to the kdump kernel when not using a relocatable kernel. */ - if (cpu_has_feature(CPU_FTR_HVMODE) && - cpu_has_feature(CPU_FTR_ARCH_207S)) { - unsigned long lpcr = mfspr(SPRN_LPCR); - mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3); + setup_kdump_trampoline(); + + /* Under a PAPR hypervisor, we need hypercalls */ + if (firmware_has_feature(FW_FEATURE_SET_MODE)) { + /* Enable AIL if possible */ + pseries_enable_reloc_on_exc(); + + /* + * Tell the hypervisor that we want our exceptions to + * be taken in little endian mode. + * + * We don't call this for big endian as our calling convention + * makes us always enter in BE, and the call may fail under + * some circumstances with kdump. + */ +#ifdef __LITTLE_ENDIAN__ + pseries_little_endian_exceptions(); +#endif + } else { + /* Set endian mode using OPAL */ + if (firmware_has_feature(FW_FEATURE_OPAL)) + opal_configure_cores(); + + /* Enable AIL if supported, and we are in hypervisor mode */ + if (cpu_has_feature(CPU_FTR_HVMODE) && + cpu_has_feature(CPU_FTR_ARCH_207S)) { + unsigned long lpcr = mfspr(SPRN_LPCR); + mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3); + } } } +static void cpu_ready_for_interrupts(void) +{ + /* Set IR and DR in PACA MSR */ + get_paca()->kernel_msr = MSR_KERNEL; +} + /* * Early initialization entry point. This is called by head.S * with MMU translation disabled. We rely on the "feature" of @@ -270,22 +288,22 @@ void __init early_setup(unsigned long dt_ptr) */ early_init_devtree(__va(dt_ptr)); - epapr_paravirt_early_init(); - /* Now we know the logical id of our boot cpu, setup the paca. */ setup_paca(&paca[boot_cpuid]); fixup_boot_paca(); - /* Probe the machine type */ - probe_machine(); - - setup_kdump_trampoline(); - - DBG("Found, Initializing memory management...\n"); + /* + * Configure exception handlers. This include setting up trampolines + * if needed, setting exception endian mode, etc... + */ + configure_exceptions(); /* Initialize the hash table or TLB handling */ early_init_mmu(); + /* Apply all the dynamic patching */ + apply_feature_fixups(); + /* * At this point, we can let interrupts switch to virtual mode * (the MMU has been setup), so adjust the MSR in the PACA to @@ -293,16 +311,6 @@ void __init early_setup(unsigned long dt_ptr) */ cpu_ready_for_interrupts(); - /* Reserve large chunks of memory for use by CMA for KVM */ - kvm_cma_reserve(); - - /* - * Reserve any gigantic pages requested on the command line. - * memblock needs to have been initialized by the time this is - * called since this will reserve memory. - */ - reserve_hugetlb_gpages(); - DBG(" <- early_setup()\n"); #ifdef CONFIG_PPC_EARLY_DEBUG_BOOTX @@ -321,7 +329,7 @@ void __init early_setup(unsigned long dt_ptr) #ifdef CONFIG_SMP void early_setup_secondary(void) { - /* Mark interrupts enabled in PACA */ + /* Mark interrupts disabled in PACA */ get_paca()->soft_enabled = 0; /* Initialize the hash table or TLB handling */ @@ -391,7 +399,7 @@ void smp_release_cpus(void) * cache informations about the CPU that will be used by cache flush * routines and/or provided to userland */ -static void __init initialize_cache_info(void) +void __init initialize_cache_info(void) { struct device_node *np; unsigned long num_cpus = 0; @@ -456,127 +464,11 @@ static void __init initialize_cache_info(void) } } - DBG(" <- initialize_cache_info()\n"); -} - - -/* - * Do some initial setup of the system. The parameters are those which - * were passed in from the bootloader. - */ -void __init setup_system(void) -{ - DBG(" -> setup_system()\n"); - - /* Apply the CPUs-specific and firmware specific fixups to kernel - * text (nop out sections not relevant to this CPU or this firmware) - */ - do_feature_fixups(cur_cpu_spec->cpu_features, - &__start___ftr_fixup, &__stop___ftr_fixup); - do_feature_fixups(cur_cpu_spec->mmu_features, - &__start___mmu_ftr_fixup, &__stop___mmu_ftr_fixup); - do_feature_fixups(powerpc_firmware_features, - &__start___fw_ftr_fixup, &__stop___fw_ftr_fixup); - do_lwsync_fixups(cur_cpu_spec->cpu_features, - &__start___lwsync_fixup, &__stop___lwsync_fixup); - do_final_fixups(); - - /* - * Unflatten the device-tree passed by prom_init or kexec - */ - unflatten_device_tree(); - - /* - * Fill the ppc64_caches & systemcfg structures with informations - * retrieved from the device-tree. - */ - initialize_cache_info(); - -#ifdef CONFIG_PPC_RTAS - /* - * Initialize RTAS if available - */ - rtas_initialize(); -#endif /* CONFIG_PPC_RTAS */ - - /* - * Check if we have an initrd provided via the device-tree - */ - check_for_initrd(); - - /* - * Do some platform specific early initializations, that includes - * setting up the hash table pointers. It also sets up some interrupt-mapping - * related options that will be used by finish_device_tree() - */ - if (ppc_md.init_early) - ppc_md.init_early(); - - /* - * We can discover serial ports now since the above did setup the - * hash table management for us, thus ioremap works. We do that early - * so that further code can be debugged - */ - find_legacy_serial_ports(); - - /* - * Register early console - */ - register_early_udbg_console(); - - /* - * Initialize xmon - */ - xmon_setup(); - - smp_setup_cpu_maps(); - check_smt_enabled(); - setup_tlb_core_data(); - - /* - * Freescale Book3e parts spin in a loop provided by firmware, - * so smp_release_cpus() does nothing for them - */ -#if defined(CONFIG_SMP) - /* Release secondary cpus out of their spinloops at 0x60 now that - * we can map physical -> logical CPU ids - */ - smp_release_cpus(); -#endif - - pr_info("Starting Linux %s %s\n", init_utsname()->machine, - init_utsname()->version); - - pr_info("-----------------------------------------------------\n"); - pr_info("ppc64_pft_size = 0x%llx\n", ppc64_pft_size); - pr_info("phys_mem_size = 0x%llx\n", memblock_phys_mem_size()); - - if (ppc64_caches.dline_size != 0x80) - pr_info("dcache_line_size = 0x%x\n", ppc64_caches.dline_size); - if (ppc64_caches.iline_size != 0x80) - pr_info("icache_line_size = 0x%x\n", ppc64_caches.iline_size); - - pr_info("cpu_features = 0x%016lx\n", cur_cpu_spec->cpu_features); - pr_info(" possible = 0x%016lx\n", CPU_FTRS_POSSIBLE); - pr_info(" always = 0x%016lx\n", CPU_FTRS_ALWAYS); - pr_info("cpu_user_features = 0x%08x 0x%08x\n", cur_cpu_spec->cpu_user_features, - cur_cpu_spec->cpu_user_features2); - pr_info("mmu_features = 0x%08x\n", cur_cpu_spec->mmu_features); - pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features); - -#ifdef CONFIG_PPC_STD_MMU_64 - if (htab_address) - pr_info("htab_address = 0x%p\n", htab_address); - - pr_info("htab_hash_mask = 0x%lx\n", htab_hash_mask); -#endif - - if (PHYSICAL_START > 0) - pr_info("physical_start = 0x%llx\n", - (unsigned long long)PHYSICAL_START); - pr_info("-----------------------------------------------------\n"); + /* For use by binfmt_elf */ + dcache_bsize = ppc64_caches.dline_size; + icache_bsize = ppc64_caches.iline_size; - DBG(" <- setup_system()\n"); + DBG(" <- initialize_cache_info()\n"); } /* This returns the limit below which memory accesses to the linear @@ -584,7 +476,7 @@ void __init setup_system(void) * used to allocate interrupt or emergency stacks for which our * exception entry path doesn't deal with being interrupted. */ -static u64 safe_stack_limit(void) +static __init u64 safe_stack_limit(void) { #ifdef CONFIG_PPC_BOOK3E /* Freescale BookE bolts the entire linear mapping */ @@ -600,7 +492,7 @@ static u64 safe_stack_limit(void) #endif } -static void __init irqstack_early_init(void) +void __init irqstack_early_init(void) { u64 limit = safe_stack_limit(); unsigned int i; @@ -620,7 +512,7 @@ static void __init irqstack_early_init(void) } #ifdef CONFIG_PPC_BOOK3E -static void __init exc_lvl_early_init(void) +void __init exc_lvl_early_init(void) { unsigned int i; unsigned long sp; @@ -642,8 +534,6 @@ static void __init exc_lvl_early_init(void) if (cpu_has_feature(CPU_FTR_DEBUG_LVL_EXC)) patch_exception(0x040, exc_debug_debug_book3e); } -#else -#define exc_lvl_early_init() #endif /* @@ -651,7 +541,7 @@ static void __init exc_lvl_early_init(void) * early in SMP boots before relocation is enabled. Exclusive emergency * stack for machine checks. */ -static void __init emergency_stack_init(void) +void __init emergency_stack_init(void) { u64 limit; unsigned int i; @@ -682,61 +572,6 @@ static void __init emergency_stack_init(void) } } -/* - * Called into from start_kernel this initializes memblock, which is used - * to manage page allocation until mem_init is called. - */ -void __init setup_arch(char **cmdline_p) -{ - *cmdline_p = boot_command_line; - - /* - * Set cache line size based on type of cpu as a default. - * Systems with OF can look in the properties on the cpu node(s) - * for a possibly more accurate value. - */ - dcache_bsize = ppc64_caches.dline_size; - icache_bsize = ppc64_caches.iline_size; - - if (ppc_md.panic) - setup_panic(); - - klp_init_thread_info(&init_thread_info); - - init_mm.start_code = (unsigned long)_stext; - init_mm.end_code = (unsigned long) _etext; - init_mm.end_data = (unsigned long) _edata; - init_mm.brk = klimit; -#ifdef CONFIG_PPC_64K_PAGES - init_mm.context.pte_frag = NULL; -#endif -#ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_init(&init_mm.context); -#endif - irqstack_early_init(); - exc_lvl_early_init(); - emergency_stack_init(); - - initmem_init(); - -#ifdef CONFIG_DUMMY_CONSOLE - conswitchp = &dummy_con; -#endif - - if (ppc_md.setup_arch) - ppc_md.setup_arch(); - - paging_init(); - - /* Initialize the MMU context management stuff */ - mmu_context_init(); - - /* Interrupt code needs to be 64K-aligned */ - if ((unsigned long)_stext & 0xffff) - panic("Kernelbase not 64K-aligned (0x%lx)!\n", - (unsigned long)_stext); -} - #ifdef CONFIG_SMP #define PCPU_DYN_SIZE () diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 25520794aa37..7e49984d4331 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -104,6 +104,7 @@ static long setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs, */ #ifdef CONFIG_ALTIVEC elf_vrreg_t __user *v_regs = sigcontext_vmx_regs(sc); + unsigned long vrsave; #endif unsigned long msr = regs->msr; long err = 0; @@ -125,9 +126,13 @@ static long setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs, /* We always copy to/from vrsave, it's 0 if we don't have or don't * use altivec. */ - if (cpu_has_feature(CPU_FTR_ALTIVEC)) - current->thread.vrsave = mfspr(SPRN_VRSAVE); - err |= __put_user(current->thread.vrsave, (u32 __user *)&v_regs[33]); + vrsave = 0; + if (cpu_has_feature(CPU_FTR_ALTIVEC)) { + vrsave = mfspr(SPRN_VRSAVE); + current->thread.vrsave = vrsave; + } + + err |= __put_user(vrsave, (u32 __user *)&v_regs[33]); #else /* CONFIG_ALTIVEC */ err |= __put_user(0, &sc->v_regs); #endif /* CONFIG_ALTIVEC */ diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 55c924b65f71..5a1f015ea9f3 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -31,6 +31,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/topology.h> +#include <linux/profile.h> #include <asm/ptrace.h> #include <linux/atomic.h> @@ -53,6 +54,7 @@ #include <asm/vdso.h> #include <asm/debug.h> #include <asm/kexec.h> +#include <asm/asm-prototypes.h> #ifdef DEBUG #include <asm/udbg.h> @@ -593,6 +595,7 @@ out: of_node_put(np); return id; } +EXPORT_SYMBOL_GPL(cpu_to_core_id); /* Helper routines for cpu to core mapping */ int cpu_core_index_of_thread(int cpu) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 692873bff334..c4f1d1f7bae0 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -35,7 +35,7 @@ static DEFINE_PER_CPU(struct cpu, cpu_devices); #ifdef CONFIG_PPC64 /* Time in microseconds we delay before sleeping in the idle loop */ -DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 }; +static DEFINE_PER_CPU(long, smt_snooze_delay) = { 100 }; static ssize_t store_smt_snooze_delay(struct device *dev, struct device_attribute *attr, diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 3ed9a5a21d77..4e7759c8ca30 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -96,7 +96,8 @@ static struct clocksource clocksource_timebase = { .read = timebase_read, }; -#define DECREMENTER_MAX 0x7fffffff +#define DECREMENTER_DEFAULT_MAX 0x7FFFFFFF +u64 decrementer_max = DECREMENTER_DEFAULT_MAX; static int decrementer_set_next_event(unsigned long evt, struct clock_event_device *dev); @@ -166,7 +167,15 @@ DEFINE_PER_CPU(unsigned long, cputime_scaled_last_delta); cputime_t cputime_one_jiffy; +#ifdef CONFIG_PPC_SPLPAR void (*dtl_consumer)(struct dtl_entry *, u64); +#endif + +#ifdef CONFIG_PPC64 +#define get_accounting(tsk) (&get_paca()->accounting) +#else +#define get_accounting(tsk) (&task_thread_info(tsk)->accounting) +#endif static void calc_cputime_factors(void) { @@ -186,7 +195,7 @@ static void calc_cputime_factors(void) * Read the SPURR on systems that have it, otherwise the PURR, * or if that doesn't exist return the timebase value passed in. */ -static u64 read_spurr(u64 tb) +static unsigned long read_spurr(unsigned long tb) { if (cpu_has_feature(CPU_FTR_SPURR)) return mfspr(SPRN_SPURR); @@ -249,8 +258,8 @@ static u64 scan_dispatch_log(u64 stop_tb) void accumulate_stolen_time(void) { u64 sst, ust; - u8 save_soft_enabled = local_paca->soft_enabled; + struct cpu_accounting_data *acct = &local_paca->accounting; /* We are called early in the exception entry, before * soft/hard_enabled are sync'ed to the expected state @@ -260,10 +269,10 @@ void accumulate_stolen_time(void) */ local_paca->soft_enabled = 0; - sst = scan_dispatch_log(local_paca->starttime_user); - ust = scan_dispatch_log(local_paca->starttime); - local_paca->system_time -= sst; - local_paca->user_time -= ust; + sst = scan_dispatch_log(acct->starttime_user); + ust = scan_dispatch_log(acct->starttime); + acct->system_time -= sst; + acct->user_time -= ust; local_paca->stolen_time += ust + sst; local_paca->soft_enabled = save_soft_enabled; @@ -275,7 +284,7 @@ static inline u64 calculate_stolen_time(u64 stop_tb) if (get_paca()->dtl_ridx != be64_to_cpu(get_lppaca()->dtl_idx)) { stolen = scan_dispatch_log(stop_tb); - get_paca()->system_time -= stolen; + get_paca()->accounting.system_time -= stolen; } stolen += get_paca()->stolen_time; @@ -295,27 +304,29 @@ static inline u64 calculate_stolen_time(u64 stop_tb) * Account time for a transition between system, hard irq * or soft irq state. */ -static u64 vtime_delta(struct task_struct *tsk, - u64 *sys_scaled, u64 *stolen) +static unsigned long vtime_delta(struct task_struct *tsk, + unsigned long *sys_scaled, + unsigned long *stolen) { - u64 now, nowscaled, deltascaled; - u64 udelta, delta, user_scaled; + unsigned long now, nowscaled, deltascaled; + unsigned long udelta, delta, user_scaled; + struct cpu_accounting_data *acct = get_accounting(tsk); WARN_ON_ONCE(!irqs_disabled()); now = mftb(); nowscaled = read_spurr(now); - get_paca()->system_time += now - get_paca()->starttime; - get_paca()->starttime = now; - deltascaled = nowscaled - get_paca()->startspurr; - get_paca()->startspurr = nowscaled; + acct->system_time += now - acct->starttime; + acct->starttime = now; + deltascaled = nowscaled - acct->startspurr; + acct->startspurr = nowscaled; *stolen = calculate_stolen_time(now); - delta = get_paca()->system_time; - get_paca()->system_time = 0; - udelta = get_paca()->user_time - get_paca()->utime_sspurr; - get_paca()->utime_sspurr = get_paca()->user_time; + delta = acct->system_time; + acct->system_time = 0; + udelta = acct->user_time - acct->utime_sspurr; + acct->utime_sspurr = acct->user_time; /* * Because we don't read the SPURR on every kernel entry/exit, @@ -337,14 +348,14 @@ static u64 vtime_delta(struct task_struct *tsk, *sys_scaled = deltascaled; } } - get_paca()->user_time_scaled += user_scaled; + acct->user_time_scaled += user_scaled; return delta; } void vtime_account_system(struct task_struct *tsk) { - u64 delta, sys_scaled, stolen; + unsigned long delta, sys_scaled, stolen; delta = vtime_delta(tsk, &sys_scaled, &stolen); account_system_time(tsk, 0, delta, sys_scaled); @@ -355,7 +366,7 @@ EXPORT_SYMBOL_GPL(vtime_account_system); void vtime_account_idle(struct task_struct *tsk) { - u64 delta, sys_scaled, stolen; + unsigned long delta, sys_scaled, stolen; delta = vtime_delta(tsk, &sys_scaled, &stolen); account_idle_time(delta + stolen); @@ -373,15 +384,32 @@ void vtime_account_idle(struct task_struct *tsk) void vtime_account_user(struct task_struct *tsk) { cputime_t utime, utimescaled; + struct cpu_accounting_data *acct = get_accounting(tsk); - utime = get_paca()->user_time; - utimescaled = get_paca()->user_time_scaled; - get_paca()->user_time = 0; - get_paca()->user_time_scaled = 0; - get_paca()->utime_sspurr = 0; + utime = acct->user_time; + utimescaled = acct->user_time_scaled; + acct->user_time = 0; + acct->user_time_scaled = 0; + acct->utime_sspurr = 0; account_user_time(tsk, utime, utimescaled); } +#ifdef CONFIG_PPC32 +/* + * Called from the context switch with interrupts disabled, to charge all + * accumulated times to the current process, and to prepare accounting on + * the next process. + */ +void arch_vtime_task_switch(struct task_struct *prev) +{ + struct cpu_accounting_data *acct = get_accounting(current); + + acct->starttime = get_accounting(prev)->starttime; + acct->system_time = 0; + acct->user_time = 0; +} +#endif /* CONFIG_PPC32 */ + #else /* ! CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ #define calc_cputime_factors() #endif @@ -504,8 +532,8 @@ static void __timer_interrupt(void) __this_cpu_inc(irq_stat.timer_irqs_event); } else { now = *next_tb - now; - if (now <= DECREMENTER_MAX) - set_dec((int)now); + if (now <= decrementer_max) + set_dec(now); /* We may have raced with new irq work */ if (test_irq_work_pending()) set_dec(1); @@ -535,7 +563,7 @@ void timer_interrupt(struct pt_regs * regs) /* Ensure a positive value is written to the decrementer, or else * some CPUs will continue to take decrementer exceptions. */ - set_dec(DECREMENTER_MAX); + set_dec(decrementer_max); /* Some implementations of hotplug will get timer interrupts while * offline, just ignore these and we also need to set @@ -583,9 +611,9 @@ static void generic_suspend_disable_irqs(void) * with suspending. */ - set_dec(DECREMENTER_MAX); + set_dec(decrementer_max); local_irq_disable(); - set_dec(DECREMENTER_MAX); + set_dec(decrementer_max); } static void generic_suspend_enable_irqs(void) @@ -866,7 +894,7 @@ static int decrementer_set_next_event(unsigned long evt, static int decrementer_shutdown(struct clock_event_device *dev) { - decrementer_set_next_event(DECREMENTER_MAX, dev); + decrementer_set_next_event(decrementer_max, dev); return 0; } @@ -892,6 +920,49 @@ static void register_decrementer_clockevent(int cpu) clockevents_register_device(dec); } +static void enable_large_decrementer(void) +{ + if (!cpu_has_feature(CPU_FTR_ARCH_300)) + return; + + if (decrementer_max <= DECREMENTER_DEFAULT_MAX) + return; + + /* + * If we're running as the hypervisor we need to enable the LD manually + * otherwise firmware should have done it for us. + */ + if (cpu_has_feature(CPU_FTR_HVMODE)) + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_LD); +} + +static void __init set_decrementer_max(void) +{ + struct device_node *cpu; + u32 bits = 32; + + /* Prior to ISAv3 the decrementer is always 32 bit */ + if (!cpu_has_feature(CPU_FTR_ARCH_300)) + return; + + cpu = of_find_node_by_type(NULL, "cpu"); + + if (of_property_read_u32(cpu, "ibm,dec-bits", &bits) == 0) { + if (bits > 64 || bits < 32) { + pr_warn("time_init: firmware supplied invalid ibm,dec-bits"); + bits = 32; + } + + /* calculate the signed maximum given this many bits */ + decrementer_max = (1ul << (bits - 1)) - 1; + } + + of_node_put(cpu); + + pr_info("time_init: %u bit decrementer (max: %llx)\n", + bits, decrementer_max); +} + static void __init init_decrementer_clockevent(void) { int cpu = smp_processor_id(); @@ -899,7 +970,7 @@ static void __init init_decrementer_clockevent(void) clockevents_calc_mult_shift(&decrementer_clockevent, ppc_tb_freq, 4); decrementer_clockevent.max_delta_ns = - clockevent_delta2ns(DECREMENTER_MAX, &decrementer_clockevent); + clockevent_delta2ns(decrementer_max, &decrementer_clockevent); decrementer_clockevent.min_delta_ns = clockevent_delta2ns(2, &decrementer_clockevent); @@ -908,6 +979,9 @@ static void __init init_decrementer_clockevent(void) void secondary_cpu_time_init(void) { + /* Enable and test the large decrementer for this cpu */ + enable_large_decrementer(); + /* Start the decrementer on CPUs that have manual control * such as BookE */ @@ -973,6 +1047,10 @@ void __init time_init(void) vdso_data->tb_update_count = 0; vdso_data->tb_ticks_per_sec = tb_ticks_per_sec; + /* initialise and enable the large decrementer (if we have one) */ + set_decrementer_max(); + enable_large_decrementer(); + /* Start the decrementer on CPUs that have manual control * such as BookE */ diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S index b7019b559ddb..298afcf3bf2a 100644 --- a/arch/powerpc/kernel/tm.S +++ b/arch/powerpc/kernel/tm.S @@ -338,8 +338,6 @@ _GLOBAL(__tm_recheckpoint) */ subi r7, r7, STACK_FRAME_OVERHEAD - SET_SCRATCH0(r1) - mfmsr r6 /* R4 = original MSR to indicate whether thread used FP/Vector etc. */ @@ -468,6 +466,7 @@ restore_gprs: * until we turn MSR RI back on. */ + SET_SCRATCH0(r1) ld r5, -8(r1) ld r1, -16(r1) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 9229ba63c370..f7e2f2e318bd 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -60,6 +60,7 @@ #include <asm/switch_to.h> #include <asm/tm.h> #include <asm/debug.h> +#include <asm/asm-prototypes.h> #include <sysdev/fsl_pci.h> #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC) @@ -1376,6 +1377,7 @@ void facility_unavailable_exception(struct pt_regs *regs) [FSCR_TM_LG] = "TM", [FSCR_EBB_LG] = "EBB", [FSCR_TAR_LG] = "TAR", + [FSCR_LM_LG] = "LM", }; char *facility = "unknown"; u64 value; @@ -1418,7 +1420,8 @@ void facility_unavailable_exception(struct pt_regs *regs) rd = (instword >> 21) & 0x1f; current->thread.dscr = regs->gpr[rd]; current->thread.dscr_inherit = 1; - mtspr(SPRN_FSCR, value | FSCR_DSCR); + current->thread.fscr |= FSCR_DSCR; + mtspr(SPRN_FSCR, current->thread.fscr); } /* Read from DSCR (mfspr RT, 0x03) */ @@ -1432,6 +1435,14 @@ void facility_unavailable_exception(struct pt_regs *regs) emulate_single_step(regs); } return; + } else if ((status == FSCR_LM_LG) && cpu_has_feature(CPU_FTR_ARCH_300)) { + /* + * This process has touched LM, so turn it on forever + * for this process + */ + current->thread.fscr |= FSCR_LM; + mtspr(SPRN_FSCR, current->thread.fscr); + return; } if ((status < ARRAY_SIZE(facility_strings)) && diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S index 1c2e7a343bf5..616a6d854638 100644 --- a/arch/powerpc/kernel/vector.S +++ b/arch/powerpc/kernel/vector.S @@ -70,10 +70,11 @@ _GLOBAL(load_up_altivec) MTMSRD(r5) /* enable use of AltiVec now */ isync - /* Hack: if we get an altivec unavailable trap with VRSAVE - * set to all zeros, we assume this is a broken application - * that fails to set it properly, and thus we switch it to - * all 1's + /* + * While userspace in general ignores VRSAVE, glibc uses it as a boolean + * to optimise userspace context save/restore. Whenever we take an + * altivec unavailable exception we must set VRSAVE to something non + * zero. Set it to all 1s. See also the programming note in the ISA. */ mfspr r4,SPRN_VRSAVE cmpwi 0,r4,0 diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 2dd91f79de05..b5fba689fca6 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -165,7 +165,7 @@ SECTIONS . = ALIGN(8); .dynsym : AT(ADDR(.dynsym) - LOAD_OFFSET) { -#ifdef CONFIG_RELOCATABLE_PPC32 +#ifdef CONFIG_PPC32 __dynamic_symtab = .; #endif *(.dynsym) |