From 1a0e13f94edc08d4c773a00d710d56fac966e818 Mon Sep 17 00:00:00 2001 From: Alejandro Colomar Date: Wed, 18 Jan 2023 13:26:47 +0100 Subject: [PATCH] Optimize csrand_uniform() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use a different algorithm to minimize rejection. This is essentially the same algorithm implemented in the Linux kernel for __get_random_u32_below(), but written in a more readable way, and avoiding microopimizations that make it less readable. Which (the Linux kernel implementation) is itself based on Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. However, I couldn't really understand that paper very much, so I had to reconstruct the proofs from scratch, just from what I could understand from the Linux kernel implementation source code. I constructed some graphical explanation of how it works, and why it is optimal, because I needed to visualize it to understand it. It is published in the GitHub pull request linked below. Here goes a wordy explanation of why this algorithm based on multiplication is better optimized than my original implementation based on masking. masking: It discards the extra bits of entropy that are not necessary for this operation. This works as if dividing the entire space of possible csrand() values into smaller spaces of a size that is a smaller power of 2. Each of those smaller spaces has a rejection band, so we get as many rejection bands as spaces there are. For smaller values of 'n', the size of each rejection band is smaller, but having more rejection bands compensates for this, and results in the same inefficiency as for large values of 'n'. multiplication: It divides the entire space of possible random numbers in chunks of size exactly 'n', so that there is only one rejection band that is the remainder of `2^64 % n`. The worst case is still similar to the masking algorithm, a rejection band that is almost half the entire space (n = 2^63 + 1), but for lower values of 'n', by only having one small rejection band, it is much faster than the masking algorithm. This algorithm, however, has one caveat: the implementation is harder to read, since it relies on several bitwise tricky operations to perform operations like `2^64 % n`, `mult % 2^64`, and `mult / 2^64`. And those operations are different depending on the number of bits of the maximum possible random number generated by the function. This means that while this algorithm could also be applied to get uniform random numbers in the range [0, n-1] quickly from a function like rand(3), which only produces 31 bits of (non-CS) random numbers, it would need to be implemented differently. However, that's not a concern for us, it's just a note so that nobody picks this code and expects it to just work with rand(3) (which BTW I tried for testing it, and got a bit confused until I realized this). Finally, here's some light testing of this implementation, just to know that I didn't goof it. I pasted this function into a standalone program, and run it many times to find if it has any bias (I tested also to see how many iterations it performs, and it's also almost always 1, but that test is big enough to not paste it here). int main(int argc, char *argv[]) { printf("%lu\n", csrand_uniform(atoi(argv[1]))); } $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 341 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 339 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 338 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 336 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 328 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 335 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 332 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 331 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 327 This isn't a complete test for a cryptographically-secure random number generator, of course, but I leave that for interested parties. Link: Link: Link: Cc: "Jason A. Donenfeld" Cc: Cristian Rodríguez Cc: Adhemerval Zanella Cc: Björn Esser Cc: Yann Droneaud Cc: Joseph Myers Cc: Sam James Cc: Serge Hallyn Cc: Iker Pedrosa [Daniel Lemire: Added link to research paper in source code] Cc: Daniel Lemire Signed-off-by: Alejandro Colomar --- libmisc/csrand.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/libmisc/csrand.c b/libmisc/csrand.c index f06b9579..d4dabe54 100644 --- a/libmisc/csrand.c +++ b/libmisc/csrand.c @@ -15,7 +15,7 @@ #if HAVE_SYS_RANDOM_H #include #endif -#include "bit.h" +#include "defines.h" #include "prototypes.h" #include "shadowlog.h" @@ -69,19 +69,29 @@ fail: /* * Return a uniformly-distributed CS random value in the interval [0, n-1]. + * + * Fast Random Integer Generation in an Interval + * ACM Transactions on Modeling and Computer Simulation 29 (1), 2019 + * */ unsigned long csrand_uniform(unsigned long n) { - unsigned long r, max, mask; + unsigned long bound, rem; + unsigned __int128 r, mult; - max = n - 1; - mask = bit_ceil_wrapul(n) - 1; + if (n == 0) + return csrand(); + + bound = -n % n; // analogous to `2^64 % n`, since `x % y == (x-y) % y` do { r = csrand(); - r &= mask; // optimization - } while (r > max); // p = ((mask + 1) % n) / (mask + 1); W.C.: p=0.5 + mult = r * n; + rem = mult; // analogous to `mult % 2^64` + } while (rem < bound); // p = (2^64 % n) / 2^64; W.C.: n=2^63+1, p=0.5 + + r = mult >> WIDTHOF(n); // analogous to `mult / 2^64` return r; }