diff options
Diffstat (limited to 'mpi/hppa/README')
-rw-r--r-- | mpi/hppa/README | 84 |
1 files changed, 0 insertions, 84 deletions
diff --git a/mpi/hppa/README b/mpi/hppa/README deleted file mode 100644 index 5a2d5fd97..000000000 --- a/mpi/hppa/README +++ /dev/null @@ -1,84 +0,0 @@ -This directory contains mpn functions for various HP PA-RISC chips. Code -that runs faster on the PA7100 and later implementations, is in the pa7100 -directory. - -RELEVANT OPTIMIZATION ISSUES - - Load and Store timing - -On the PA7000 no memory instructions can issue the two cycles after a store. -For the PA7100, this is reduced to one cycle. - -The PA7100 has a lookup-free cache, so it helps to schedule loads and the -dependent instruction really far from each other. - -STATUS - -1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the - instructions bwlow (but some sw pipelining is needed to avoid the - xmpyu-fstds delay): - - fldds s1_ptr - - xmpyu - fstds N(%r30) - xmpyu - fstds N(%r30) - - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - - addc - stws res_ptr - addc - stws res_ptr - - addib Loop - -2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb - (asymptotically) on the PA7100, using the instructions below. With proper - sw pipelining and the unrolling level below, the speed becomes 8 - cycles/limb. - - fldds s1_ptr - fldds s1_ptr - - xmpyu - fstds N(%r30) - xmpyu - fstds N(%r30) - xmpyu - fstds N(%r30) - xmpyu - fstds N(%r30) - - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - ldws N(%r30) - addc - addc - addc - addc - addc %r0,%r0,cy-limb - - ldws res_ptr - ldws res_ptr - ldws res_ptr - ldws res_ptr - add - stws res_ptr - addc - stws res_ptr - addc - stws res_ptr - addc - stws res_ptr - - addib |