linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc/perf: Adds support for programming of Thresholding in P10	Kajol Jain	2021-02-11	11	-34/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thresholding, a performance monitoring unit feature, can be used to identify marked instructions which take more than expected cycles between start event and end event. Threshold compare (thresh_cmp) bits are programmed in MMCRA register. In Power9, thresh_cmp bits were part of the event code. But in case of P10, thresh_cmp are not part of event code due to inclusion of MMCR3 bits. Patch here adds an option to use attr.config1 variable to be used to pass thresh_cmp value to be programmed in MMCRA register. A new ppmu flag called PPMU_HAS_ATTR_CONFIG1 has been added and this flag is used to notify the use of attr.config1 variable. Patch has extended the parameter list of 'compute_mmcr', to include power_pmu's 'flags' element and parameter list of get_constraint to include attr.config1 value. It also extend parameter list of power_check_constraints inorder to pass perf_event list. As stated by commit ef0e3b650f8d ("powerpc/perf: Fix Threshold Event Counter Multiplier width for P10"), constraint bits for thresh_cmp is also needed to be increased to 11 bits, which is handled as part of this patch. We added bit number 53 as part of constraint bits of thresh_cmp for power10 to make it an 11 bit field. Updated layout for p10: /* * Layout of constraint bits: * * 60 56 52 48 44 40 36 32 * \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| * [ fab_match ] [ thresh_cmp ] [ thresh_ctl ] [ ] * \| \| * [ thresh_cmp bits for p10] thresh_sel -* * * 28 24 20 16 12 8 4 0 * \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| - - - - \| * [ ] \| [ ] \| [ sample ] [ ] [6] [5] [4] [3] [2] [1] * \| \| \| \| \| * BHRB IFM -* \| \| \|radix_scope \| Count of events for each PMC. EBB -* \| \| p1, p2, p3, p4, p5, p6. * L1 I/D qualifier -* \| * nc - number of counters -* * * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints * we want the low bit of each field to be added to any existing value. * * Everything else is a value field. */ Result: command#: cat /sys/devices/cpu/format/thresh_cmp config1:0-17 ex. usage: command#: perf record -I --weight -d -e cpu/event=0x67340101EC,thresh_cmp=500/ ./ebizzy -S 2 -t 1 -s 4096 1826636 records/s real 2.00 s user 2.00 s sys 0.00 s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.038 MB perf.data (61 samples) ] Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210209095234.837356-1-kjain@linux.ibm.com
*	powerpc/pci: Remove unimplemented prototypes	Oliver O'Halloran	2021-02-11	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	The corresponding definitions were deleted in commit 3d5134ee8341 ("[POWERPC] Rewrite IO allocation & mapping on powerpc64") which was merged a mere 13 years ago. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902035138.1762531-1-oohall@gmail.com
*	powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()	Christophe Leroy	2021-02-11	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Since commit 17bc43367fc2 ("powerpc/uaccess: Implement unsafe_copy_to_user() as a simple loop"), raw_copy_to_user_allowed() is only used by raw_copy_to_user(). Merge raw_copy_to_user_allowed() into raw_copy_to_user(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3ae114740317187e12edbd5ffa9157cb8c396dea.1612879284.git.christophe.leroy@csgroup.eu
*	powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()	Christophe Leroy	2021-02-11	1	-7/+3
\| \| \| \| \| \| \| \| \| \|	__put_user_size_allowed() is only called from __put_user_size() now. Merge them together. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b3baeaec1ee2fbdc653bb6fb27b0be5b846163ef.1612879284.git.christophe.leroy@csgroup.eu
*	powerpc/uaccess: get rid of small constant size cases in ↵	Christophe Leroy	2021-02-11	1	-41/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	raw_copy_{to,from}_user() Copied from commit 4b842e4e25b1 ("x86: get rid of small constant size cases in raw_copy_{to,from}_user()") Very few call sites where that would be triggered remain, and none of those is anywhere near hot enough to bother. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99d4ccb58a20d8408d0e19874393655ad5b40822.1612879284.git.christophe.leroy@csgroup.eu
*	powerpc/64: Fix stack trace not displaying final frame	Michael Ellerman	2021-02-11	3	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit bf13718bc57a ("powerpc: show registers when unwinding interrupt frames") we changed our stack dumping logic to show the full registers whenever we find an interrupt frame on the stack. However we didn't notice that on 64-bit this doesn't show the final frame, ie. the interrupt that brought us in from userspace, whereas on 32-bit it does. That is due to confusion about the size of that last frame. The code in show_stack() calls validate_sp(), passing it STACK_INT_FRAME_SIZE to check the sp is at least that far below the top of the stack. However on 64-bit that size is too large for the final frame, because it includes the red zone, but we don't allocate a red zone for the first frame. So add a new define that encodes the correct size for 32-bit and 64-bit, and use it in show_stack(). This results in the full trace being shown on 64-bit, eg: sysrq: Trigger a crash Kernel panic - not syncing: sysrq triggered crash CPU: 0 PID: 83 Comm: sh Not tainted 5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty #649 Call Trace: [c00000000a1c3ac0] [c000000000897b70] dump_stack+0xc4/0x114 (unreliable) [c00000000a1c3b00] [c00000000014334c] panic+0x178/0x41c [c00000000a1c3ba0] [c00000000094e600] sysrq_handle_crash+0x40/0x50 [c00000000a1c3c00] [c00000000094ef98] __handle_sysrq+0xd8/0x210 [c00000000a1c3ca0] [c00000000094f820] write_sysrq_trigger+0x100/0x188 [c00000000a1c3ce0] [c0000000005559dc] proc_reg_write+0x10c/0x1b0 [c00000000a1c3d10] [c000000000479950] vfs_write+0xf0/0x360 [c00000000a1c3d60] [c000000000479d9c] ksys_write+0x7c/0x140 [c00000000a1c3db0] [c00000000002bf5c] system_call_exception+0x19c/0x2c0 [c00000000a1c3e10] [c00000000000d35c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff9fbab428 NIP: 00007fff9fbab428 LR: 000000001000b724 CTR: 0000000000000000 REGS: c00000000a1c3e80 TRAP: 0c00 Not tainted (5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty) MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 22002884 XER: 00000000 IRQMASK: 0 GPR00: 0000000000000004 00007fffc3cb8960 00007fff9fc59900 0000000000000001 GPR04: 000000002a4b32d0 0000000000000002 0000000000000063 0000000000000063 GPR08: 000000002a4b32d0 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff9fcca9a0 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0 GPR20: 000000002a4b3485 00000000100b8f90 0000000000000000 0000000000000000 GPR24: 000000002a4b0440 00000000100e77b8 0000000000000020 000000002a4b32d0 GPR28: 0000000000000001 0000000000000002 000000002a4b32d0 0000000000000001 NIP [00007fff9fbab428] 0x7fff9fbab428 LR [000000001000b724] 0x1000b724 --- interrupt: c00 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210209141627.2898485-1-mpe@ellerman.id.au
*	powerpc/time: Remove get_tbl()	Christophe Leroy	2021-02-11	1	-6/+0
\| \| \| \| \| \| \| \|	There are no more users of get_tbl(). Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dd0368bfd497ffe06b31ee1b5f2ebf7760e30900.1612866360.git.christophe.leroy@csgroup.eu
*	powerpc/time: Avoid using get_tbl()	Christophe Leroy	2021-02-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	get_tbl() is confusing as it returns the content TBL register on PPC32 but the concatenation of TBL and TBU on PPC64. Use mftb() instead. This will allow the removal of get_tbl() in a following patch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/decefb47c8a2070bf55d20b096b813908c7b3110.1612866360.git.christophe.leroy@csgroup.eu
*	spi: mpc52xx: Avoid using get_tbl()	Christophe Leroy	2021-02-11	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	get_tbl() is confusing as it returns the content TBL register on PPC32 but the concatenation of TBL and TBU on PPC64. Use mftb() instead. This will allow the removal of get_tbl() in a following patch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99bf008e2970de7f8ed3225cda69a6d06ae1a644.1612866360.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: Avoid storing 'current' in another pointer	Christophe Leroy	2021-02-11	1	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By saving the pointer pointing to thread_info.flags, gcc copies r2 in a non-volatile register. We know 'current' doesn't change, so avoid that intermediaite pointer. Reduces null_syscall benchmark by 2 cycles (322 => 320 cycles) On PPC64, gcc seems to know that 'current' is not changing, and it keeps it in a non volatile register to avoid multiple read of 'current' in paca. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ad0363ff0ff8c125f40e1cdc589a85bbd7e31693.1612946484.git.christophe.leroy@csgroup.eu
*	powerpc/32: Handle bookE debugging in C in syscall entry/exit	Christophe Leroy	2021-02-11	6	-55/+42
\| \| \| \| \| \| \| \| \| \| \|	The handling of SPRN_DBCR0 and other registers can easily be done in C instead of ASM. For that, create booke_load_dbcr0() and booke_restore_dbcr0(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1a7515f9258b27a9177de88491a8bb79b255ceb7.1612898425.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: Do not check unsupported scv vector on PPC32	Christophe Leroy	2021-02-11	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Only book3s/64 has scv. No need to check the 0x7ff0 trap on 32 or 64e. For that, add a helper trap_is_unsupported_scv() similar to trap_is_scv(). And ignore the scv parameter in syscall_exit_prepare (Save 14 cycles 346 => 332 cycles) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fb87b205ae8eb8c623f33bb316801acf95a831e6.1612898425.git.christophe.leroy@csgroup.eu
*	powerpc/32: Remove the counter in global_dbcr0	Christophe Leroy	2021-02-11	3	-16/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	global_dbcr0 has two parts, 4 bytes to save/restore the value of SPRN_DBCR0, and 4 bytes that are incremented/decremented everytime something is saving/loading the above value. This counter is only incremented/decremented, its value is never used and never read. Remove the counter and devide the size of global_dbcr0 by 2. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7e381dc58b3f583556cfab37ba5d813bfd5cce1e.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry	Christophe Leroy	2021-02-11	3	-36/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	system_call_exception() checks MSR_PR and BUGs if a syscall is issued from kernel mode. No need to handle it anymore from the ASM entry code. null_syscall reduction 2 cycles (348 => 346 cycles) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1eddb42cb12092b1e3d72608d182c365db3da41d.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: implement system call entry/exit logic in C for PPC32	Christophe Leroy	2021-02-11	3	-229/+30
\| \| \| \| \| \| \| \| \| \| \| \|	That's port of PPC64 syscall entry/exit logic in C to PPC32. Performancewise on 8xx: Before : 304 cycles on null_syscall After : 348 cycles on null_syscall Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a93b08e1275e9d1f0b1c39043d1b827586b2b401.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/32: Always save non volatile GPRs at syscall entry	Christophe Leroy	2021-02-11	4	-62/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for porting syscall entry/exit to C, inconditionally save non volatile general purpose registers. Commit 965dd3ad3076 ("powerpc/64/syscall: Remove non-volatile GPR save optimisation") provides detailed explanation. This increases the number of cycles by 24 cycles on 8xx with null_syscall benchmark (280 => 304 cycles) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/21c08162b83655195fe9ead78ff2cfd28508d023.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: Change condition to check MSR_RI	Christophe Leroy	2021-02-11	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	In system_call_exception(), MSR_RI also needs to be checked on 8xx. Only booke and 40x doesn't have MSR_RI. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/67820fada8dd6a8fe9d7b666f175d4cc9d8de87e.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: Save r3 in regs->orig_r3	Christophe Leroy	2021-02-11	2	-2/+2
\| \| \| \| \| \| \| \| \|	Save r3 in regs->orig_r3 in system_call_exception() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9a90805ab6b9101b46daf56470f457a57acd86fc.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: Use is_compat_task()	Christophe Leroy	2021-02-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	Instead of hard comparing task flags with _TIF_32BIT, use is_compat_task(). The advantage is that it returns 0 on PPC32 allthough _TIF_32BIT is always set. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c8094662199337a7200fea9f6e1d1f8b1b6d5f69.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: Make interrupt.c buildable on PPC32	Christophe Leroy	2021-02-11	2	-9/+26
\| \| \| \| \| \| \| \| \| \| \| \|	To allow building interrupt.c on PPC32, ifdef out specific PPC64 code or use helpers which are available on both PP32 and PPC64 Modify Makefile to always build interrupt.o Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ba073ad67bd971a88ce331b65d6655523b54c794.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/syscall: Rename syscall_64.c into interrupt.c	Christophe Leroy	2021-02-11	2	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	syscall_64.c will be reused almost as is for PPC32. As this file also contains functions to handle other types of interrupts rename it interrupt.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cddc2deaa8f049d3ec419738e69804934919b935.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/irq: Add stub irq_soft_mask_return() for PPC32	Christophe Leroy	2021-02-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	To allow building syscall_64.c smoothly on PPC32, add stub version of irq_soft_mask_return(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9b9f62c5e2e63cc121fd749a923aaaee92ee0da4.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/irq: Rework helpers that manipulate MSR[EE/RI]	Christophe Leroy	2021-02-11	2	-24/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation of porting PPC32 to C syscall entry/exit, rewrite the following helpers as static inline functions and add support for PPC32 in them: __hard_irq_enable() __hard_irq_disable() __hard_EE_RI_disable() __hard_RI_enable() Then use them in PPC32 version of arch_local_irq_disable() and arch_local_irq_enable() to avoid code duplication. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0e290372a0e7dc2ae657b4a01aec85f8de7fdf77.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/irq: Add helper to set regs->softe	Christophe Leroy	2021-02-11	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \|	regs->softe doesn't exist on PPC32. Add irq_soft_mask_regs_set_state() helper to set regs->softe. This helper will void on PPC32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5f37d1177a751fdbca79df461d283850ca3a34a2.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/32: Reorder instructions to avoid using CTR in syscall entry	Christophe Leroy	2021-02-11	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we are using rfi instead of mtmsr to reactivate MMU, it is possible to reorder instructions and avoid the need to use CTR for stashing SRR0. null_syscall on 8xx is reduced by 3 cycles (283 => 280 cycles). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8fa13a59f73647e058c95fc7e1c7a98f316bd20a.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/32: On syscall entry, enable instruction translation at the same ↵	Christophe Leroy	2021-02-11	2	-22/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	time as data On 40x and 8xx, kernel text is pinned. On book3s/32, kernel text is mapped by BATs. Enable instruction translation at the same time as data translation, it makes things simpler. MSR_RI can also be set at the same time because srr0/srr1 are already saved and r1 is set properly. On booke, translation is always on, so at the end all PPC32 have translation on early. This reduces null_syscall benchmark by 13 cycles on 8xx (296 ==> 283 cycles). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3fe8891c814103a3549efc1d4e7ffc828bba5993.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/32: Always enable data translation on syscall entry	Christophe Leroy	2021-02-11	2	-24/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the code can use a stack in vm area, it can also use a stack in linear space. Simplify code by removing old non VMAP stack code on PPC32 in syscall. That means the data translation is now re-enabled early in syscall entry in all cases, not only when using VMAP stacks. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/412c6c1786922d991bbb89c2ad2e82cffe8ab112.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/32s: Add missing call to kuep_lock on syscall entry	Christophe Leroy	2021-02-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace Execution protection and fast syscall entry were implemented independently from each other and were both merged in kernel 5.2, leading to syscall entry missing userspace execution protection. On syscall entry, execution of user space memory must be locked in the same way as on exception entry. Fixes: b86fb88855ea ("powerpc/32: implement fast entry for syscalls on non BOOKE") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c65e105b63aaf74f91a14f845bc77192350b84a6.1612796617.git.christophe.leroy@csgroup.eu
*	powerpc/compat_sys: swap hi/lo parts of 64-bit syscall args in LE mode	Will Springer	2021-02-11	1	-21/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Swap upper/lower 32 bits for 64-bit compat syscalls, conditioned on endianness. This is modeled after the same functionality in arch/mips/kernel/linux32.c. This fixes compat_sys on ppc64le, when called by 32-bit little-endian processes. Tested with `file /bin/bash` (pread64) and `truncate -s 5G test` (ftruncate64). Signed-off-by: Will Springer <skirmisher@protonmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2765111.e9J7NaK4W3@sheen
*	powerpc: use kernel endianness in MSR in 32-bit signal handler	Joseph J Allen	2021-02-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This mirrors the behavior in handle_rt_signal32, to obey kernel endianness rather than assume a 32-bit process is big-endian. Without this change, any 32-bit little-endian process will SIGILL immediately upon handling a signal. Signed-off-by: Joseph J Allen <eerykitty@gmail.com> Signed-off-by: Will Springer <skirmisher@protonmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2058876.irdbgypaU6@sheen
*	powerpc/kexec_file: fix FDT size estimation for kdump kernel	Hari Bathini	2021-02-11	3	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On systems with large amount of memory, loading kdump kernel through kexec_file_load syscall may fail with the below error: "Failed to update fdt with linux,drconf-usable-memory property" This happens because the size estimation for kdump kernel's FDT does not account for the additional space needed to setup usable memory properties. Fix it by accounting for the space needed to include linux,usable-memory & linux,drconf-usable-memory properties while estimating kdump kernel's FDT size. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161243826811.119001.14083048209224609814.stgit@hbathini
*	powerpc/mm: Remove dcache flush from memory remove.	Aneesh Kumar K.V	2021-02-11	2	-22/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We added dcache flush on memory add/remove in commit fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") to handle crashes on GPU hotplug. Instead of adding dcache flush in generic memory add/remove routine which is used even for regular memory, we should handle these devices specific flush in the device driver code. memtrace did handle this in the driver and that was removed by commit 7fd6641de28f ("powerpc/powernv/memtrace: Let the arch hotunplug code flush cache"). This patch reverts that commit. The dcache flush in memory add was removed by commit ea458effa88e ("powerpc: Don't flush caches when adding memory") which I don't think is correct. The reason why we require dcache flush in memtrace is to make sure we don't have a dirty cache when we remap a pfn to cache inhibited. We should do that when the memtrace module removes the memory and make the pfn available for HTM traces to map it as cache inhibited. The other device mentioned in commit fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") is nvlink device with coherent memory. The support for that was removed in commit 7eb3cf761927 ("powerpc/powernv: remove unused NPU DMA code") and commit 25b2995a35b6 ("mm: remove MEMORY_DEVICE_PUBLIC support") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-3-aneesh.kumar@linux.ibm.com
*	powerpc/mm: Add PG_dcache_clean to indicate dcache clean state	Aneesh Kumar K.V	2021-02-11	5	-13/+19
\| \| \| \| \| \| \| \|	This just add a better name for PG_arch_1. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-2-aneesh.kumar@linux.ibm.com
*	powerpc/mm: Enable compound page check for both THP and HugeTLB	Aneesh Kumar K.V	2021-02-11	3	-26/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	THP config results in compound pages. Make sure the kernel enables the PageCompound() check with CONFIG_HUGETLB_PAGE disabled and CONFIG_TRANSPARENT_HUGEPAGE enabled. This makes sure we correctly flush the icache with THP pages. flush_dcache_icache_page only matter for platforms that don't support COHERENT_ICACHE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-1-aneesh.kumar@linux.ibm.com
*	powerpc/xive: Assign boolean values to a bool variable	Jiapeng Chong	2021-02-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the following coccicheck warnings: ./arch/powerpc/kvm/book3s_xive.c:1856:2-17: WARNING: Assignment of 0/1 to bool variable. ./arch/powerpc/kvm/book3s_xive.c:1854:2-17: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612680192-43116-1-git-send-email-jiapeng.chong@linux.alibaba.com
*	powerpc/32: Preserve cr1 in exception prolog stack check to fix build error	Christophe Leroy	2021-02-11	2	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	THREAD_ALIGN_SHIFT = THREAD_SHIFT + 1 = PAGE_SHIFT + 1 Maximum PAGE_SHIFT is 18 for 256k pages so THREAD_ALIGN_SHIFT is 19 at the maximum. No need to clobber cr1, it can be preserved when moving r1 into CR when we check stack overflow. This reduces the number of instructions in Machine Check Exception prolog and fixes a build failure reported by the kernel test robot on v5.10 stable when building with RTAS + VMAP_STACK + KVM. That build failure is due to too many instructions in the prolog hence not fitting between 0x200 and 0x300. Allthough the problem doesn't show up in mainline, it is still worth the change. Fixes: 98bf2d3f4970 ("powerpc/32s: Fix RTAS machine check with VMAP stack") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5ae4d545e3ac58e133d2599e0deb88843cb494fc.1612768623.git.christophe.leroy@csgroup.eu
*	powerpc/64s: Remove EXSLB interrupt save area	Nicholas Piggin	2021-02-11	3	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SLB faults should not be taken while the PACA save areas are live, all memory accesses should be fetches from the kernel text, and access to PACA and the current stack, before C code is called or any other accesses are made. All of these have pinned SLBs so will not take a SLB fault. Therefore EXSLB is not be required. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208063406.331655-1-npiggin@gmail.com
*	powerpc/64s: syscall real mode entry use mtmsrd rather than rfid	Nicholas Piggin	2021-02-11	2	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Have the real mode system call entry handler branch to the kernel 0xc000... address and then use mtmsrd to enable the MMU, rather than use SRRs and rfid. Commit 8729c26e675c ("powerpc/64s/exception: Move real to virt switch into the common handler") implemented this style of real mode entry for other interrupt handlers, so this brings system calls into line with them, which is the main motivcation for the change. This tends to be slightly faster due to avoiding the mtsprs, and it also does not clobber the SRR registers, which becomes important in a subsequent change. The real mode entry points don't tend to be too important for performance these days, but it is possible for a hypervisor to run guests in AIL=0 mode for certian reasons. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208063326.331502-1-npiggin@gmail.com
*	powerpc/kuap: Restore AMR after replaying soft interrupts	Alexey Kardashevskiy	2021-02-11	1	-1/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection"), user access helpers call user_{read\|write}_access_{begin\|end} when user space access is allowed. Commit 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") made the mentioned helpers program a AMR special register to allow such access for a short period of time, most of the time AMR is expected to block user memory access by the kernel. Since the code accesses the user space memory, unsafe_get_user() calls might_fault() which calls arch_local_irq_restore() if either CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP is enabled. arch_local_irq_restore() then attempts to replay pending soft interrupts as KUAP regions have hardware interrupts enabled. If a pending interrupt happens to do user access (performance interrupts do that), it enables access for a short period of time so after returning from the replay, the user access state remains blocked and if a user page fault happens - "Bug: Read fault blocked by AMR!" appears and SIGSEGV is sent. An example trace: Bug: Read fault blocked by AMR! WARNING: CPU: 0 PID: 1603 at /home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145 CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24 NIP: c00000000009ece8 LR: c00000000009ece4 CTR: 0000000000000000 REGS: c00000000dc63560 TRAP: 0700 Not tainted (5.10.0-rc6_v5.10-rc6_a+fstn1) MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28002888 XER: 20040000 CFAR: c0000000001fa928 IRQMASK: 1 GPR00: c00000000009ece4 c00000000dc637f0 c000000002397600 000000000000001f GPR04: c0000000020eb318 0000000000000000 c00000000dc63494 0000000000000027 GPR08: c00000007fe4de68 c00000000dfe9180 0000000000000000 0000000000000001 GPR12: 0000000000002000 c0000000030a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 bfffffffffffffff GPR20: 0000000000000000 c0000000134a4020 c0000000019c2218 0000000000000fe0 GPR24: 0000000000000000 0000000000000000 c00000000d106200 0000000040000000 GPR28: 0000000000000000 0000000000000300 c00000000dc63910 c000000001946730 NIP __do_page_fault+0xb38/0xde0 LR __do_page_fault+0xb34/0xde0 Call Trace: __do_page_fault+0xb34/0xde0 (unreliable) handle_page_fault+0x10/0x2c --- interrupt: 300 at strncpy_from_user+0x290/0x440 LR = strncpy_from_user+0x284/0x440 strncpy_from_user+0x2f0/0x440 (unreliable) getname_flags+0x88/0x2c0 do_sys_openat2+0x2d4/0x5f0 do_sys_open+0xcc/0x140 system_call_exception+0x160/0x240 system_call_common+0xf0/0x27c To fix it save/restore the AMR when replaying interrupts, and also add a check if AMR was not blocked prior to replaying interrupts. Originally found by syzkaller. Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use normal commit citation format and add full oops log to change log, move kuap_check_amr() into the restore routine to avoid warnings about unreconciled IRQ state] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202091541.36499-1-aik@ozlabs.ru
*	powerpc/uaccess: Avoid might_fault() when user access is enabled	Alexey Kardashevskiy	2021-02-11	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The amount of code executed with enabled user space access (unlocked KUAP) should be minimal. However with CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP enabled, might_fault() calls into various parts of the kernel, and may even end up replaying interrupts which in turn may access user space and forget to restore the KUAP state. The problem places are: 1. strncpy_from_user (and similar) which unlock KUAP and call unsafe_get_user -> __get_user_allowed -> __get_user_nocheck() with do_allow=false to skip KUAP as the caller took care of it. 2. __unsafe_put_user_goto() which is called with unlocked KUAP. eg: WARNING: CPU: 30 PID: 1 at arch/powerpc/include/asm/book3s/64/kup.h:324 arch_local_irq_restore+0x160/0x190 NIP arch_local_irq_restore+0x160/0x190 LR lock_is_held_type+0x140/0x200 Call Trace: 0xc00000007f392ff8 (unreliable) ___might_sleep+0x180/0x320 __might_fault+0x50/0xe0 filldir64+0x2d0/0x5d0 call_filldir+0xc8/0x180 ext4_readdir+0x948/0xb40 iterate_dir+0x1ec/0x240 sys_getdents64+0x80/0x290 system_call_exception+0x160/0x280 system_call_common+0xf0/0x27c Change __get_user_nocheck() to look at `do_allow` to decide whether to skip might_fault(). Since strncpy_from_user/etc call might_fault() anyway before unlocking KUAP, there should be no visible change. Drop might_fault() in __unsafe_put_user_goto() as it is only called from unsafe_put_user(), which already has KUAP unlocked. Since keeping might_fault() is still desirable for debugging, add calls to it in user_[read\|write]_access_begin(). That also allows us to drop the is_kernel_addr() test, because there should be no code using user_[read\|write]_access_begin() in order to access a kernel address. Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [mpe: Combine with related patch from myself, merge change logs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210204121612.32721-1-aik@ozlabs.ru
*	powerpc/uaccess: Simplify unsafe_put_user() implementation	Michael Ellerman	2021-02-11	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently unsafe_put_user() expands to __put_user_goto(), which expands to __put_user_nocheck_goto(). There are no other uses of __put_user_nocheck_goto(), and although there are some other uses of __put_user_goto() those could just use unsafe_put_user(). Every layer of indirection introduces the possibility that some code is calling that layer, and makes keeping track of the required semantics at each point more complicated. So drop __put_user_goto(), and rename __put_user_nocheck_goto() to __unsafe_put_user_goto(). The "nocheck" is implied by "unsafe". Replace the few uses of __put_user_goto() with unsafe_put_user(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208135717.2618798-1-mpe@ellerman.id.au
*	powerpc/amigaone: Make amigaone_discover_phbs() static	Michael Ellerman	2021-02-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	It's only used in setup.c, so make it static. Fixes: 053d58c87029 ("powerpc/amigaone: Move PHB discovery") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210210130804.3190952-3-mpe@ellerman.id.au
*	powerpc/mm/64s: Fix no previous prototype warning	Michael Ellerman	2021-02-11	3	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	As reported by lkp: arch/powerpc/mm/book3s64/radix_tlb.c:646:6: warning: no previous prototype for function 'exit_lazy_flush_tlb' Fix it by moving the prototype into the existing header. Fixes: 032b7f08932c ("powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210210130804.3190952-2-mpe@ellerman.id.au
*	powerpc/83xx: Fix build error when CONFIG_PCI=n	Michael Ellerman	2021-02-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As reported by lkp: arch/powerpc/platforms/83xx/km83xx.c:183:19: error: 'mpc83xx_setup_pci' undeclared here (not in a function) 183 \| .discover_phbs = mpc83xx_setup_pci, \| ^~~~~~~~~~~~~~~~~ \| mpc83xx_setup_arch There is a stub defined for the CONFIG_PCI=n case, but now that mpc83xx_setup_pci() is being assigned to discover_phbs the correct empty value is NULL. Fixes: 83f84041ff1c ("powerpc/83xx: Move PHB discovery") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210210130804.3190952-1-mpe@ellerman.id.au
*	powerpc: remove interrupt handler functions from the noinstr section	Nicholas Piggin	2021-02-11	3	-15/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The allyesconfig ppc64 kernel fails to link with relocations unable to fit after commit 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers"), which is due to the interrupt handler functions being put into the .noinstr.text section, which the linker script places on the opposite side of the main .text section from the interrupt entry asm code which calls the handlers. This results in a lot of linker stubs that overwhelm the 252-byte sized space we allow for them, or in the case of BE a .opd relocation link error for some reason. It's not required to put interrupt handlers in the .noinstr section, previously they used NOKPROBE_SYMBOL, so take them out and replace with a NOKPROBE_SYMBOL in the wrapper macro. Remove the explicit NOKPROBE_SYMBOL macros in the interrupt handler functions. This makes a number of interrupt handlers nokprobe that were not prior to the interrupt wrappers commit, but since that commit they were made nokprobe due to being in .noinstr.text, so this fix does not change that. The fixes tag is different to the commit that first exposes the problem because it is where the wrapper macros were introduced. Fixes: 8d41fc618ab8 ("powerpc: interrupt handler wrapper functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Slightly fix up comment wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210211063636.236420-1-npiggin@gmail.com
*	powerpc/powernv/pci: Use kzalloc() for phb related allocations	Michael Ellerman	2021-02-11	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of commit fbbefb320214 ("powerpc/pci: Move PHB discovery for PCI_DN using platforms"), I switched some allocations from memblock_alloc() to kmalloc(), otherwise memblock would warn that it was being called after slab init. However I missed that the code relied on the allocations being zeroed, without which we could end up crashing: pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to ff BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6af7 Faulting instruction address: 0xc0000000000dbc90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV ... NIP pnv_ioda_get_pe_state+0xe0/0x1d0 LR pnv_ioda_get_pe_state+0xb4/0x1d0 Call Trace: pnv_ioda_get_pe_state+0xb4/0x1d0 (unreliable) pnv_pci_config_check_eeh.isra.9+0x78/0x270 pnv_pci_read_config+0xf8/0x160 pci_bus_read_config_dword+0xa4/0x120 pci_bus_generic_read_dev_vendor_id+0x54/0x270 pci_scan_single_device+0xb8/0x140 pci_scan_slot+0x80/0x1b0 pci_scan_child_bus_extend+0x94/0x490 pcibios_scan_phb+0x1f8/0x3c0 pcibios_init+0x8c/0x12c do_one_initcall+0x94/0x510 kernel_init_freeable+0x35c/0x3fc kernel_init+0x2c/0x168 ret_from_kernel_thread+0x5c/0x70 Switch them to kzalloc(). Fixes: fbbefb320214 ("powerpc/pci: Move PHB discovery for PCI_DN using platforms") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210211112749.3410771-1-mpe@ellerman.id.au
*	powerpc/64s: Handle program checks in wrong endian during early boot	Michael Ellerman	2021-02-08	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a short window during boot where although the kernel is running little endian, any exceptions will cause the CPU to switch back to big endian. This situation persists until we call configure_exceptions(), which calls either the hypervisor or OPAL to configure the CPU so that exceptions will be taken in little endian (via HID0[HILE]). We don't intend to take exceptions during early boot, but one way we sometimes do is via a WARN/BUG etc. Those all boil down to a trap instruction, which will cause a program check exception. The first instruction of the program check handler is an mtsprg, which when executed in the wrong endian is an lhzu with a ~3GB displacement from r3. The content of r3 is random, so that becomes a load from some random location, and depending on the system (installed RAM etc.) can easily lead to a checkstop, or an infinitely recursive page fault. That prevents whatever the WARN/BUG was complaining about being printed to the console, and the user just sees a dead system. We can fix it by having a trampoline at the beginning of the program check handler that detects we are in the wrong endian, and flips us back to the correct endian. We can't flip MSR[LE] using mtmsr (alas), so we have to use rfid. That requires backing up SRR0/1 as well as a GPR. To do that we use SPRG0/2/3 (SPRG1 is already used for the paca). SPRG3 is user readable, but this trampoline is only active very early in boot, and SPRG3 will be reinitialised in vdso_getcpu_init() before userspace starts. With this trampoline in place we can survive a WARN early in boot and print a stack trace, which is eventually printed to the console once the console is up, eg: [83565.758545] kexec_core: Starting new kernel [ 0.000000] ------------[ cut here ]------------ [ 0.000000] static_key_enable_cpuslocked(): static key '0xc000000000ea6160' used before call to jump_label_init() [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xfc/0x120 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-gcc-8.2.0-dirty #618 [ 0.000000] NIP: c0000000002fd46c LR: c0000000002fd468 CTR: c000000000170660 [ 0.000000] REGS: c000000001227940 TRAP: 0700 Not tainted (5.10.0-gcc-8.2.0-dirty) [ 0.000000] MSR: 9000000002823003 <SF,HV,VEC,VSX,FP,ME,RI,LE> CR: 24882422 XER: 20040000 [ 0.000000] CFAR: 0000000000000730 IRQMASK: 1 [ 0.000000] GPR00: c0000000002fd468 c000000001227bd0 c000000001228300 0000000000000065 [ 0.000000] GPR04: 0000000000000001 0000000000000065 c0000000010cf970 000000000000000d [ 0.000000] GPR08: 0000000000000000 0000000000000000 0000000000000000 c00000000122763f [ 0.000000] GPR12: 0000000000002000 c000000000f8a980 0000000000000000 0000000000000000 [ 0.000000] GPR16: 0000000000000000 0000000000000000 c000000000f88c8e c000000000f88c9a [ 0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] GPR24: 0000000000000000 c000000000dea3a8 0000000000000000 c000000000f35114 [ 0.000000] GPR28: 0000002800000000 c000000000f88c9a c000000000f88c8e c000000000ea6160 [ 0.000000] NIP [c0000000002fd46c] static_key_enable_cpuslocked+0xfc/0x120 [ 0.000000] LR [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120 [ 0.000000] Call Trace: [ 0.000000] [c000000001227bd0] [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120 (unreliable) [ 0.000000] [c000000001227c40] [c0000000002fd4c0] static_key_enable+0x30/0x50 [ 0.000000] [c000000001227c70] [c000000000f6629c] early_page_poison_param+0x58/0x9c [ 0.000000] [c000000001227cb0] [c000000000f351b8] do_early_param+0xa4/0x10c [ 0.000000] [c000000001227d30] [c00000000011e020] parse_args+0x270/0x5e0 [ 0.000000] [c000000001227e20] [c000000000f35864] parse_early_options+0x48/0x5c [ 0.000000] [c000000001227e40] [c000000000f358d0] parse_early_param+0x58/0x84 [ 0.000000] [c000000001227e70] [c000000000f3a368] early_init_devtree+0xc4/0x490 [ 0.000000] [c000000001227f10] [c000000000f3bca0] early_setup+0xc8/0x1c8 [ 0.000000] [c000000001227f90] [000000000000c320] 0xc320 [ 0.000000] Instruction dump: [ 0.000000] 4bfffddd 7c2004ac 39200001 913f0000 4bffffb8 7c651b78 3c82ffac 3c62ffc0 [ 0.000000] 38841b00 3863f310 4bdf03a5 60000000 <0fe00000> 4bffff38 60000000 60000000 [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x80 with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] dt-cpu-ftrs: setup for ISA 3000 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202130207.1303975-2-mpe@ellerman.id.au
*	powerpc/64: Make stack tracing work during very early boot	Michael Ellerman	2021-02-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we try to stack trace very early during boot, either due to a WARN/BUG or manual dump_stack(), we will oops in valid_emergency_stack() when we try to dereference the paca_ptrs array. The fix is simple, we just return false if paca_ptrs isn't allocated yet. The stack pointer definitely isn't part of any emergency stack because we haven't allocated any yet. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202130207.1303975-1-mpe@ellerman.id.au
*	powerpc64/idle: Fix SP offsets when saving GPRs	Christopher M. Riedl	2021-02-08	1	-65/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idle entry/exit code saves/restores GPRs in the stack "red zone" (Protected Zone according to PowerPC64 ELF ABI v2). However, the offset used for the first GPR is incorrect and overwrites the back chain - the Protected Zone actually starts below the current SP. In practice this is probably not an issue, but it's still incorrect so fix it. Also expand the comments to explain why using the stack "red zone" instead of creating a new stackframe is appropriate here. Signed-off-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210206072342.5067-1-cmr@codefail.de
*	powerpc/32s: Allow constant folding in mtsr()/mfsr()	Christophe Leroy	2021-02-08	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	On the same way as we did in wrtee(), add an alternative using mtsr/mfsr instructions instead of mtsrin/mfsrin when the segment register can be determined at compile time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9baed0ff9d76723ec90f1b567ddd4ac1ecc7a190.1612612022.git.christophe.leroy@csgroup.eu