summaryrefslogtreecommitdiffstats
path: root/drivers/clocksource
diff options
context:
space:
mode:
authorChen Jie <chenj@lemote.com>2015-03-26 18:07:24 +0100
committerRalf Baechle <ralf@linux-mips.org>2015-04-01 17:22:11 +0200
commit615eb603f4e1da4f151c7fac4aa175753a9913ec (patch)
tree5f879da5f72fd016a77522a3bfea67a0486ac42c /drivers/clocksource
parentMIPS: BCM47XX: Fix coding style to match kernel standards (diff)
downloadlinux-615eb603f4e1da4f151c7fac4aa175753a9913ec.tar.xz
linux-615eb603f4e1da4f151c7fac4aa175753a9913ec.zip
MIPS: csum_partial: Improve instruction parallelism.
Computing sum introduces true data dependency. This patch removes some true data depdendencies, hence increases instruction level parallelism. This patch brings up to 50% csum performance gain on Loongson 3a. One example about how this patch works is in CSUM_BIGCHUNK1: // ** original ** vs ** patch applied ** ADDC(sum, t0) ADDC(t0, t1) ADDC(sum, t1) ADDC(t2, t3) ADDC(sum, t2) ADDC(sum, t0) ADDC(sum, t3) ADDC(sum, t2) In the original implementation, each ADDC(sum, ...) depends on the sum value updated by previous ADDC(as source operand). With this patch applied, the first two ADDC operations are independent, hence can be executed simultaneously if possible. Another example is in the "copy and sum calculating chunk": // ** original ** vs ** patch applied ** STORE(t0, UNIT(0) ... STORE(t0, UNIT(0) ... ADDC(sum, t0) ADDC(t0, t1) STORE(t1, UNIT(1) ... STORE(t1, UNIT(1) ... ADDC(sum, t1) ADDC(sum, t0) STORE(t2, UNIT(2) ... STORE(t2, UNIT(2) ... ADDC(sum, t2) ADDC(t2, t3) STORE(t3, UNIT(3) ... STORE(t3, UNIT(3) ... ADDC(sum, t3) ADDC(sum, t2) With this patch applied, ADDC and the **next next** ADDC are independent. Signed-off-by: chenj <chenj@lemote.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/9608/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Diffstat (limited to 'drivers/clocksource')
0 files changed, 0 insertions, 0 deletions