Add csrand_uniform()
This API is similar to arc4random_uniform(3). However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.
This function will be used to reimplement csrand_interval() as a wrapper
around this one.
The current implementation of csrand_interval() doesn't produce very good
random numbers. It has a bias. And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.
Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.
We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.
BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents. Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2022-12-30 19:46:09 +01:00
|
|
|
/*
|
|
|
|
* SPDX-FileCopyrightText: Alejandro Colomar <alx@kernel.org>
|
|
|
|
*
|
|
|
|
* SPDX-License-Identifier: BSD-3-Clause
|
|
|
|
*/
|
|
|
|
|
2022-12-30 19:46:09 +01:00
|
|
|
#include <config.h>
|
|
|
|
|
|
|
|
#ident "$Id$"
|
|
|
|
|
|
|
|
#include <limits.h>
|
2023-01-30 13:13:36 +01:00
|
|
|
#include <stdint.h>
|
2022-12-30 19:46:09 +01:00
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#if HAVE_SYS_RANDOM_H
|
|
|
|
#include <sys/random.h>
|
|
|
|
#endif
|
2023-01-30 13:13:36 +01:00
|
|
|
#include "bit.h"
|
Optimize csrand_uniform()
Use a different algorithm to minimize rejection. This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.
Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below. However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.
I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it. It is
published in the GitHub pull request linked below.
Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.
masking:
It discards the extra bits of entropy that are not necessary for
this operation. This works as if dividing the entire space of
possible csrand() values into smaller spaces of a size that is
a smaller power of 2. Each of those smaller spaces has a
rejection band, so we get as many rejection bands as spaces
there are. For smaller values of 'n', the size of each
rejection band is smaller, but having more rejection bands
compensates for this, and results in the same inefficiency as
for large values of 'n'.
multiplication:
It divides the entire space of possible random numbers in
chunks of size exactly 'n', so that there is only one rejection
band that is the remainder of `2^64 % n`. The worst case is
still similar to the masking algorithm, a rejection band that is
almost half the entire space (n = 2^63 + 1), but for lower
values of 'n', by only having one small rejection band, it is
much faster than the masking algorithm.
This algorithm, however, has one caveat: the implementation
is harder to read, since it relies on several bitwise tricky
operations to perform operations like `2^64 % n`, `mult % 2^64`,
and `mult / 2^64`. And those operations are different depending
on the number of bits of the maximum possible random number
generated by the function. This means that while this algorithm
could also be applied to get uniform random numbers in the range
[0, n-1] quickly from a function like rand(3), which only
produces 31 bits of (non-CS) random numbers, it would need to be
implemented differently. However, that's not a concern for us,
it's just a note so that nobody picks this code and expects it
to just work with rand(3) (which BTW I tried for testing it, and
got a bit confused until I realized this).
Finally, here's some light testing of this implementation, just to know
that I didn't goof it. I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).
int main(int argc, char *argv[])
{
printf("%lu\n", csrand_uniform(atoi(argv[1])));
}
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327
This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.
Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <https://github.com/shadow-maint/shadow/pull/624#discussion_r1059574358>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2023-01-18 13:26:47 +01:00
|
|
|
#include "defines.h"
|
2022-12-30 19:46:09 +01:00
|
|
|
#include "prototypes.h"
|
|
|
|
#include "shadowlog.h"
|
|
|
|
|
|
|
|
|
2023-01-30 13:13:36 +01:00
|
|
|
static uint32_t csrand_uniform32(uint32_t n);
|
|
|
|
static unsigned long csrand_uniform_slow(unsigned long n);
|
|
|
|
|
|
|
|
|
2022-12-30 19:46:09 +01:00
|
|
|
/*
|
|
|
|
* Return a uniformly-distributed CS random u_long value.
|
|
|
|
*/
|
|
|
|
unsigned long
|
|
|
|
csrand(void)
|
|
|
|
{
|
|
|
|
FILE *fp;
|
|
|
|
unsigned long r;
|
|
|
|
|
|
|
|
#ifdef HAVE_GETENTROPY
|
|
|
|
/* getentropy may exist but lack kernel support. */
|
|
|
|
if (getentropy(&r, sizeof(r)) == 0)
|
|
|
|
return r;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef HAVE_GETRANDOM
|
|
|
|
/* Likewise getrandom. */
|
|
|
|
if (getrandom(&r, sizeof(r), 0) == sizeof(r))
|
|
|
|
return r;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef HAVE_ARC4RANDOM_BUF
|
|
|
|
/* arc4random_buf can never fail. */
|
|
|
|
arc4random_buf(&r, sizeof(r));
|
|
|
|
return r;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Use /dev/urandom as a last resort. */
|
|
|
|
fp = fopen("/dev/urandom", "r");
|
|
|
|
if (NULL == fp) {
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (fread(&r, sizeof(r), 1, fp) != 1) {
|
|
|
|
fclose(fp);
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
|
|
|
fclose(fp);
|
|
|
|
return r;
|
|
|
|
|
|
|
|
fail:
|
|
|
|
fprintf(log_get_logfd(), _("Unable to obtain random bytes.\n"));
|
|
|
|
exit(1);
|
|
|
|
}
|
Add csrand_uniform()
This API is similar to arc4random_uniform(3). However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.
This function will be used to reimplement csrand_interval() as a wrapper
around this one.
The current implementation of csrand_interval() doesn't produce very good
random numbers. It has a bias. And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.
Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.
We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.
BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents. Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2022-12-30 19:46:09 +01:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return a uniformly-distributed CS random value in the interval [0, n-1].
|
2023-01-30 13:13:36 +01:00
|
|
|
*/
|
|
|
|
unsigned long
|
|
|
|
csrand_uniform(unsigned long n)
|
|
|
|
{
|
|
|
|
if (n == 0 || n > UINT32_MAX)
|
|
|
|
return csrand_uniform_slow(n);
|
|
|
|
|
|
|
|
return csrand_uniform32(n);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return a uniformly-distributed CS random value in the interval [min, max].
|
|
|
|
*/
|
|
|
|
unsigned long
|
|
|
|
csrand_interval(unsigned long min, unsigned long max)
|
|
|
|
{
|
|
|
|
return csrand_uniform(max - min + 1) + min;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
Optimize csrand_uniform()
Use a different algorithm to minimize rejection. This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.
Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below. However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.
I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it. It is
published in the GitHub pull request linked below.
Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.
masking:
It discards the extra bits of entropy that are not necessary for
this operation. This works as if dividing the entire space of
possible csrand() values into smaller spaces of a size that is
a smaller power of 2. Each of those smaller spaces has a
rejection band, so we get as many rejection bands as spaces
there are. For smaller values of 'n', the size of each
rejection band is smaller, but having more rejection bands
compensates for this, and results in the same inefficiency as
for large values of 'n'.
multiplication:
It divides the entire space of possible random numbers in
chunks of size exactly 'n', so that there is only one rejection
band that is the remainder of `2^64 % n`. The worst case is
still similar to the masking algorithm, a rejection band that is
almost half the entire space (n = 2^63 + 1), but for lower
values of 'n', by only having one small rejection band, it is
much faster than the masking algorithm.
This algorithm, however, has one caveat: the implementation
is harder to read, since it relies on several bitwise tricky
operations to perform operations like `2^64 % n`, `mult % 2^64`,
and `mult / 2^64`. And those operations are different depending
on the number of bits of the maximum possible random number
generated by the function. This means that while this algorithm
could also be applied to get uniform random numbers in the range
[0, n-1] quickly from a function like rand(3), which only
produces 31 bits of (non-CS) random numbers, it would need to be
implemented differently. However, that's not a concern for us,
it's just a note so that nobody picks this code and expects it
to just work with rand(3) (which BTW I tried for testing it, and
got a bit confused until I realized this).
Finally, here's some light testing of this implementation, just to know
that I didn't goof it. I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).
int main(int argc, char *argv[])
{
printf("%lu\n", csrand_uniform(atoi(argv[1])));
}
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327
This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.
Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <https://github.com/shadow-maint/shadow/pull/624#discussion_r1059574358>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2023-01-18 13:26:47 +01:00
|
|
|
* Fast Random Integer Generation in an Interval
|
|
|
|
* ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
|
|
|
|
* <https://arxiv.org/abs/1805.10941>
|
Add csrand_uniform()
This API is similar to arc4random_uniform(3). However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.
This function will be used to reimplement csrand_interval() as a wrapper
around this one.
The current implementation of csrand_interval() doesn't produce very good
random numbers. It has a bias. And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.
Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.
We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.
BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents. Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2022-12-30 19:46:09 +01:00
|
|
|
*/
|
2023-01-30 13:13:36 +01:00
|
|
|
static uint32_t
|
|
|
|
csrand_uniform32(uint32_t n)
|
Add csrand_uniform()
This API is similar to arc4random_uniform(3). However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.
This function will be used to reimplement csrand_interval() as a wrapper
around this one.
The current implementation of csrand_interval() doesn't produce very good
random numbers. It has a bias. And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.
Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.
We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.
BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents. Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2022-12-30 19:46:09 +01:00
|
|
|
{
|
2023-01-30 13:13:36 +01:00
|
|
|
uint32_t bound, rem;
|
|
|
|
uint64_t r, mult;
|
Optimize csrand_uniform()
Use a different algorithm to minimize rejection. This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.
Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below. However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.
I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it. It is
published in the GitHub pull request linked below.
Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.
masking:
It discards the extra bits of entropy that are not necessary for
this operation. This works as if dividing the entire space of
possible csrand() values into smaller spaces of a size that is
a smaller power of 2. Each of those smaller spaces has a
rejection band, so we get as many rejection bands as spaces
there are. For smaller values of 'n', the size of each
rejection band is smaller, but having more rejection bands
compensates for this, and results in the same inefficiency as
for large values of 'n'.
multiplication:
It divides the entire space of possible random numbers in
chunks of size exactly 'n', so that there is only one rejection
band that is the remainder of `2^64 % n`. The worst case is
still similar to the masking algorithm, a rejection band that is
almost half the entire space (n = 2^63 + 1), but for lower
values of 'n', by only having one small rejection band, it is
much faster than the masking algorithm.
This algorithm, however, has one caveat: the implementation
is harder to read, since it relies on several bitwise tricky
operations to perform operations like `2^64 % n`, `mult % 2^64`,
and `mult / 2^64`. And those operations are different depending
on the number of bits of the maximum possible random number
generated by the function. This means that while this algorithm
could also be applied to get uniform random numbers in the range
[0, n-1] quickly from a function like rand(3), which only
produces 31 bits of (non-CS) random numbers, it would need to be
implemented differently. However, that's not a concern for us,
it's just a note so that nobody picks this code and expects it
to just work with rand(3) (which BTW I tried for testing it, and
got a bit confused until I realized this).
Finally, here's some light testing of this implementation, just to know
that I didn't goof it. I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).
int main(int argc, char *argv[])
{
printf("%lu\n", csrand_uniform(atoi(argv[1])));
}
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327
This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.
Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <https://github.com/shadow-maint/shadow/pull/624#discussion_r1059574358>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2023-01-18 13:26:47 +01:00
|
|
|
|
|
|
|
if (n == 0)
|
|
|
|
return csrand();
|
Add csrand_uniform()
This API is similar to arc4random_uniform(3). However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.
This function will be used to reimplement csrand_interval() as a wrapper
around this one.
The current implementation of csrand_interval() doesn't produce very good
random numbers. It has a bias. And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.
Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.
We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.
BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents. Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2022-12-30 19:46:09 +01:00
|
|
|
|
Optimize csrand_uniform()
Use a different algorithm to minimize rejection. This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.
Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below. However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.
I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it. It is
published in the GitHub pull request linked below.
Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.
masking:
It discards the extra bits of entropy that are not necessary for
this operation. This works as if dividing the entire space of
possible csrand() values into smaller spaces of a size that is
a smaller power of 2. Each of those smaller spaces has a
rejection band, so we get as many rejection bands as spaces
there are. For smaller values of 'n', the size of each
rejection band is smaller, but having more rejection bands
compensates for this, and results in the same inefficiency as
for large values of 'n'.
multiplication:
It divides the entire space of possible random numbers in
chunks of size exactly 'n', so that there is only one rejection
band that is the remainder of `2^64 % n`. The worst case is
still similar to the masking algorithm, a rejection band that is
almost half the entire space (n = 2^63 + 1), but for lower
values of 'n', by only having one small rejection band, it is
much faster than the masking algorithm.
This algorithm, however, has one caveat: the implementation
is harder to read, since it relies on several bitwise tricky
operations to perform operations like `2^64 % n`, `mult % 2^64`,
and `mult / 2^64`. And those operations are different depending
on the number of bits of the maximum possible random number
generated by the function. This means that while this algorithm
could also be applied to get uniform random numbers in the range
[0, n-1] quickly from a function like rand(3), which only
produces 31 bits of (non-CS) random numbers, it would need to be
implemented differently. However, that's not a concern for us,
it's just a note so that nobody picks this code and expects it
to just work with rand(3) (which BTW I tried for testing it, and
got a bit confused until I realized this).
Finally, here's some light testing of this implementation, just to know
that I didn't goof it. I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).
int main(int argc, char *argv[])
{
printf("%lu\n", csrand_uniform(atoi(argv[1])));
}
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327
This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.
Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <https://github.com/shadow-maint/shadow/pull/624#discussion_r1059574358>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2023-01-18 13:26:47 +01:00
|
|
|
bound = -n % n; // analogous to `2^64 % n`, since `x % y == (x-y) % y`
|
Add csrand_uniform()
This API is similar to arc4random_uniform(3). However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.
This function will be used to reimplement csrand_interval() as a wrapper
around this one.
The current implementation of csrand_interval() doesn't produce very good
random numbers. It has a bias. And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.
Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.
We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.
BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents. Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2022-12-30 19:46:09 +01:00
|
|
|
|
|
|
|
do {
|
|
|
|
r = csrand();
|
Optimize csrand_uniform()
Use a different algorithm to minimize rejection. This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.
Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below. However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.
I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it. It is
published in the GitHub pull request linked below.
Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.
masking:
It discards the extra bits of entropy that are not necessary for
this operation. This works as if dividing the entire space of
possible csrand() values into smaller spaces of a size that is
a smaller power of 2. Each of those smaller spaces has a
rejection band, so we get as many rejection bands as spaces
there are. For smaller values of 'n', the size of each
rejection band is smaller, but having more rejection bands
compensates for this, and results in the same inefficiency as
for large values of 'n'.
multiplication:
It divides the entire space of possible random numbers in
chunks of size exactly 'n', so that there is only one rejection
band that is the remainder of `2^64 % n`. The worst case is
still similar to the masking algorithm, a rejection band that is
almost half the entire space (n = 2^63 + 1), but for lower
values of 'n', by only having one small rejection band, it is
much faster than the masking algorithm.
This algorithm, however, has one caveat: the implementation
is harder to read, since it relies on several bitwise tricky
operations to perform operations like `2^64 % n`, `mult % 2^64`,
and `mult / 2^64`. And those operations are different depending
on the number of bits of the maximum possible random number
generated by the function. This means that while this algorithm
could also be applied to get uniform random numbers in the range
[0, n-1] quickly from a function like rand(3), which only
produces 31 bits of (non-CS) random numbers, it would need to be
implemented differently. However, that's not a concern for us,
it's just a note so that nobody picks this code and expects it
to just work with rand(3) (which BTW I tried for testing it, and
got a bit confused until I realized this).
Finally, here's some light testing of this implementation, just to know
that I didn't goof it. I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).
int main(int argc, char *argv[])
{
printf("%lu\n", csrand_uniform(atoi(argv[1])));
}
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327
This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.
Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <https://github.com/shadow-maint/shadow/pull/624#discussion_r1059574358>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2023-01-18 13:26:47 +01:00
|
|
|
mult = r * n;
|
|
|
|
rem = mult; // analogous to `mult % 2^64`
|
|
|
|
} while (rem < bound); // p = (2^64 % n) / 2^64; W.C.: n=2^63+1, p=0.5
|
|
|
|
|
|
|
|
r = mult >> WIDTHOF(n); // analogous to `mult / 2^64`
|
Add csrand_uniform()
This API is similar to arc4random_uniform(3). However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.
This function will be used to reimplement csrand_interval() as a wrapper
around this one.
The current implementation of csrand_interval() doesn't produce very good
random numbers. It has a bias. And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.
Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.
We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.
BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents. Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2022-12-30 19:46:09 +01:00
|
|
|
|
|
|
|
return r;
|
|
|
|
}
|
2022-12-30 19:46:09 +01:00
|
|
|
|
|
|
|
|
2023-01-30 13:13:36 +01:00
|
|
|
static unsigned long
|
|
|
|
csrand_uniform_slow(unsigned long n)
|
2022-12-30 19:46:09 +01:00
|
|
|
{
|
2023-01-30 13:13:36 +01:00
|
|
|
unsigned long r, max, mask;
|
|
|
|
|
|
|
|
max = n - 1;
|
|
|
|
mask = bit_ceil_wrapul(n) - 1;
|
|
|
|
|
|
|
|
do {
|
|
|
|
r = csrand();
|
|
|
|
r &= mask; // optimization
|
|
|
|
} while (r > max); // p = ((mask+1) % n) / (mask+1); W.C.: p=0.5
|
|
|
|
|
|
|
|
return r;
|
2022-12-30 19:46:09 +01:00
|
|
|
}
|