summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/traps.c
diff options
context:
space:
mode:
authorCyril Bur <cyrilbur@gmail.com>2017-11-02 04:09:05 +0100
committerMichael Ellerman <mpe@ellerman.id.au>2017-11-06 10:39:33 +0100
commiteb5c3f1c86470fc1a57ab28cce15c12e4d6cdf8b (patch)
tree0d865f26543e4cc4ec7cca183c0aaddbfbf426a4 /arch/powerpc/kernel/traps.c
parentpowerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable e... (diff)
downloadlinux-eb5c3f1c86470fc1a57ab28cce15c12e4d6cdf8b.tar.xz
linux-eb5c3f1c86470fc1a57ab28cce15c12e4d6cdf8b.zip
powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). These optimisations sit in what is by definition a slow path. If a process has to go through a reclaim/recheckpoint then its transaction will be doomed on returning to userspace. This mean that the process will be unable to complete its transaction and be forced to its failure handler. This is already an out if line case for userspace. Furthermore, the cost of copying 64 times 128 bits from registers isn't very long[0] (at all) on modern processors. As such it appears these optimisations have only served to increase code complexity and are unlikely to have had a measurable performance impact. Our transactional memory handling has been riddled with bugs. A cause of this has been difficulty in following the code flow, code complexity has not been our friend here. It makes sense to remove these optimisations in favour of a (hopefully) more stable implementation. This patch does mean that some times the assembly will needlessly save 'junk' registers which will subsequently get overwritten with the correct value by the C code which calls the assembly function. This small inefficiency is far outweighed by the reduction in complexity for general TM code, context switching paths, and transactional facility unavailable exception handler. 0: I tried to measure it once for other work and found that it was hiding in the noise of everything else I was working with. I find it exceedingly likely this will be the case here. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diffstat (limited to 'arch/powerpc/kernel/traps.c')
-rw-r--r--arch/powerpc/kernel/traps.c26
1 files changed, 6 insertions, 20 deletions
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e099e61c702b..4e5a543d0b74 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1693,7 +1693,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
* If VMX is in use, the VRs now hold checkpointed values,
* so we don't want to load the VRs from the thread_struct.
*/
- tm_recheckpoint(&current->thread, orig_msr | MSR_FP);
+ tm_recheckpoint(&current->thread);
/* If VMX is in use, get the transactional values back */
if (orig_msr & MSR_VEC) {
@@ -1721,7 +1721,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
regs->nip, regs->msr);
tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1;
- tm_recheckpoint(&current->thread, orig_msr | MSR_VEC);
+ tm_recheckpoint(&current->thread);
current->thread.used_vr = 1;
if (orig_msr & MSR_FP) {
@@ -1733,8 +1733,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
void vsx_unavailable_tm(struct pt_regs *regs)
{
- unsigned long orig_msr = regs->msr;
-
/* See the comments in fp_unavailable_tm(). This works similarly,
* though we're loading both FP and VEC registers in here.
*
@@ -1748,29 +1746,17 @@ void vsx_unavailable_tm(struct pt_regs *regs)
current->thread.used_vsr = 1;
- /* If FP and VMX are already loaded, we have all the state we need */
- if ((orig_msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) {
- regs->msr |= MSR_VSX;
- return;
- }
-
/* This reclaims FP and/or VR regs if they're already enabled */
tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1;
current->thread.load_fp = 1;
- /* This loads & recheckpoints FP and VRs; but we have
- * to be sure not to overwrite previously-valid state.
- */
- tm_recheckpoint(&current->thread, orig_msr | MSR_FP | MSR_VEC);
-
- msr_check_and_set(orig_msr & (MSR_FP | MSR_VEC));
+ tm_recheckpoint(&current->thread);
- if (orig_msr & MSR_FP)
- load_fp_state(&current->thread.fp_state);
- if (orig_msr & MSR_VEC)
- load_vr_state(&current->thread.vr_state);
+ msr_check_and_set(MSR_FP | MSR_VEC);
+ load_fp_state(&current->thread.fp_state);
+ load_vr_state(&current->thread.vr_state);
}
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */