f92ae1dc4b
Change sp_256to512z_mont_{mul,sqr}_8 to not require/zero upper 256 bits. There is only one place where we actually used that (and that's why there used to be zeroing memset of top half!). Fix up that place. As a bonus, 256x256->512 multiply no longer needs to care for "r overlaps a or b" case. This shrinks sp_point structure as well, not just temporaries. function old new delta sp_256to512z_mont_mul_8 150 - -150 sp_256_mont_mul_8 - 147 +147 sp_256to512z_mont_sqr_8 7 - -7 sp_256_mont_sqr_8 - 7 +7 sp_256_ecc_mulmod_8 494 543 +49 sp_512to256_mont_reduce_8 243 249 +6 sp_256_point_from_bin2x32 73 70 -3 sp_256_proj_point_dbl_8 353 345 -8 sp_256_proj_point_add_8 544 499 -45 ------------------------------------------------------------------------------ (add/remove: 2/2 grow/shrink: 2/3 up/down: 209/-213) Total: -4 bytes Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>