Denys Vlasenko
4923f74e58
libbb/sha1: shrink unrolled x86-64 code
...
function old new delta
sha1_process_block64 3482 3481 -1
.rodata 108460 108412 -48
------------------------------------------------------------------------------
(add/remove: 1/4 grow/shrink: 0/2 up/down: 0/-49) Total: -49 bytes
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-02-08 03:29:16 +01:00
Denys Vlasenko
c193cbd6df
libbb/sha1: shrink and speed up unrolled x86-64 code
...
function old new delta
sha1_process_block64 3514 3482 -32
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-02-07 02:34:04 +01:00
Denys Vlasenko
6472ac9428
libbb/sha256: optional x86 hardware accelerated hashing
...
64 bit:
function old new delta
sha256_process_block64_shaNI - 730 +730
.rodata 108314 108586 +272
sha256_begin 31 83 +52
------------------------------------------------------------------------------
(add/remove: 5/1 grow/shrink: 2/0 up/down: 1055/-1) Total: 1054 bytes
32 bit:
function old new delta
sha256_process_block64_shaNI - 747 +747
.rodata 104318 104590 +272
sha256_begin 29 84 +55
------------------------------------------------------------------------------
(add/remove: 5/1 grow/shrink: 2/0 up/down: 1075/-1) Total: 1074 bytes
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-02-03 14:58:02 +01:00
Denys Vlasenko
205042c07a
libbb/sha1: in unrolled x86-64 code, pass initial W[] in registers, not on stack
...
This can be faster on some CPUs.
On Skylake, evidently load latency from L1 (or store-to-load
forwarding in LSU) is fast enough to completely hide
memory reference latencies here.
function old new delta
sha1_process_block64 3495 3514 +19
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-25 17:21:45 +01:00
Denys Vlasenko
39369ff460
libbb/sha1: use SSE2 in unrolled x86-64 code. ~10% faster
...
function old new delta
.rodata 108241 108305 +64
sha1_process_block64 3502 3495 -7
------------------------------------------------------------------------------
(add/remove: 5/0 grow/shrink: 1/1 up/down: 64/-7) Total: 57 bytes
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-23 12:57:27 +01:00
Denys Vlasenko
805ececa61
whitespace fix
...
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-08 00:41:09 +01:00
Denys Vlasenko
c3cfcc9242
libbb/sha1: x86_64 version: reorder prologue/epilogue insns
...
Not clear exactly why, but this increases hashing speed
on Skylake from 454 MB/s to 464 MB/s.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-04 01:45:52 +01:00
Denys Vlasenko
7abb2bb96e
libbb/sha1: x86_64 version: tidying up, no code changes
...
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 17:02:48 +01:00
Denys Vlasenko
4387077f8e
typo fix
...
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 13:14:09 +01:00
Denys Vlasenko
947bef0dea
libbb/sha1: x86_64 version: generate from a script, optimize a bit
...
function old new delta
sha1_process_block64 3569 3502 -67
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 13:10:30 +01:00
Denys Vlasenko
05fd13ebec
libbb/sha1: x86_64 version: move to a separate .S file, no code changes
...
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 12:57:36 +01:00