summaryrefslogtreecommitdiffstats
path: root/crypto/sha
diff options
context:
space:
mode:
authorAndy Polyakov <appro@openssl.org>2007-05-10 08:48:28 +0200
committerAndy Polyakov <appro@openssl.org>2007-05-10 08:48:28 +0200
commit0bd8d6e2e1b9e534167a021340be4f815bfc5fb1 (patch)
tree94544ce1ef4c7fe5df0343d63f5e7067db0d5921 /crypto/sha
parentDetect UltraSPARC T1 in ./config. (diff)
downloadopenssl-0bd8d6e2e1b9e534167a021340be4f815bfc5fb1.tar.xz
openssl-0bd8d6e2e1b9e534167a021340be4f815bfc5fb1.zip
Commentary updates to SHA for sparcv9.
Diffstat (limited to 'crypto/sha')
-rw-r--r--crypto/sha/asm/sha1-sparcv9.pl16
-rw-r--r--crypto/sha/asm/sha512-sparcv9.pl11
2 files changed, 19 insertions, 8 deletions
diff --git a/crypto/sha/asm/sha1-sparcv9.pl b/crypto/sha/asm/sha1-sparcv9.pl
index 9f2d159514..8306fc88cc 100644
--- a/crypto/sha/asm/sha1-sparcv9.pl
+++ b/crypto/sha/asm/sha1-sparcv9.pl
@@ -8,13 +8,15 @@
# ====================================================================
# Performance improvement is not really impressive on pre-T1 CPU: +8%
-# over Sun C and +25% over gcc [3.3]. While on T1, ... And there
-# is a gimmick. X[16] vector is packed to 8 64-bit registers and as
-# result nothing is spilled on stack. In addition input data is loaded
-# in compact instruction sequence, thus minimizing the window when the
-# code is subject to [inter-thread] cache-thrashing hazard. The goal
-# is to ensure scalability on UltraSPARC T1, or rather to avoid decay
-# when amount of active threads exceeds the number of physical cores.
+# over Sun C and +25% over gcc [3.3]. While on T1, a.k.a. Niagara, it
+# turned to be 40% faster than 64-bit code generated by Sun C 5.8 and
+# >2x than 64-bit code generated by gcc 3.4. And there is a gimmick.
+# X[16] vector is packed to 8 64-bit registers and as result nothing
+# is spilled on stack. In addition input data is loaded in compact
+# instruction sequence, thus minimizing the window when the code is
+# subject to [inter-thread] cache-thrashing hazard. The goal is to
+# ensure scalability on UltraSPARC T1, or rather to avoid decay when
+# amount of active threads exceeds the number of physical cores.
$bits=32;
for (@ARGV) { $bits=64 if (/\-m64/ || /\-xarch\=v9/); }
diff --git a/crypto/sha/asm/sha512-sparcv9.pl b/crypto/sha/asm/sha512-sparcv9.pl
index bd9afcb115..25f80390ac 100644
--- a/crypto/sha/asm/sha512-sparcv9.pl
+++ b/crypto/sha/asm/sha512-sparcv9.pl
@@ -23,7 +23,16 @@
#
# SHA512 on UltraSPARC T1.
#
-# ...
+# It's not any faster than 64-bit code generated by Sun C 5.8. This is
+# because 64-bit code generator has the advantage of using 64-bit
+# loads to access X[16], which I consciously traded for 32-/64-bit ABI
+# duality [as per above]. But it surpasses 32-bit Sun C generated code
+# by 60%, not to mention that it doesn't suffer from severe decay when
+# running 4 times physical cores threads and that it leaves gcc [3.4]
+# behind by over 4x factor! If compared to SHA256, single thread
+# performance is only 10% better, but overall throughput for maximum
+# amount of threads for given CPU exceeds corresponding one of SHA256
+# by 30% [again, optimal coefficient is 50%].
$bits=32;
for (@ARGV) { $bits=64 if (/\-m64/ || /\-xarch\=v9/); }