summaryrefslogtreecommitdiffstats
path: root/mpi/hppa/README
diff options
context:
space:
mode:
Diffstat (limited to 'mpi/hppa/README')
-rw-r--r--mpi/hppa/README84
1 files changed, 0 insertions, 84 deletions
diff --git a/mpi/hppa/README b/mpi/hppa/README
deleted file mode 100644
index 5a2d5fd97..000000000
--- a/mpi/hppa/README
+++ /dev/null
@@ -1,84 +0,0 @@
-This directory contains mpn functions for various HP PA-RISC chips. Code
-that runs faster on the PA7100 and later implementations, is in the pa7100
-directory.
-
-RELEVANT OPTIMIZATION ISSUES
-
- Load and Store timing
-
-On the PA7000 no memory instructions can issue the two cycles after a store.
-For the PA7100, this is reduced to one cycle.
-
-The PA7100 has a lookup-free cache, so it helps to schedule loads and the
-dependent instruction really far from each other.
-
-STATUS
-
-1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
- instructions bwlow (but some sw pipelining is needed to avoid the
- xmpyu-fstds delay):
-
- fldds s1_ptr
-
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
-
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
-
- addc
- stws res_ptr
- addc
- stws res_ptr
-
- addib Loop
-
-2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
- (asymptotically) on the PA7100, using the instructions below. With proper
- sw pipelining and the unrolling level below, the speed becomes 8
- cycles/limb.
-
- fldds s1_ptr
- fldds s1_ptr
-
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
-
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- addc
- addc
- addc
- addc
- addc %r0,%r0,cy-limb
-
- ldws res_ptr
- ldws res_ptr
- ldws res_ptr
- ldws res_ptr
- add
- stws res_ptr
- addc
- stws res_ptr
- addc
- stws res_ptr
- addc
- stws res_ptr
-
- addib