random: use rejection sampling for uniform bounded random integers

Until the very recent commits, many bounded random integers were calculated using `get_random_u32() % max_plus_one`, which not only incurs the price of a division -- indicating performance mostly was not a real issue -- but also does not result in a uniformly distributed output if max_plus_one is not a power of two. Recent commits moved to using `prandom_u32_max(max_plus_one)`, which replaces the division with a faster multiplication, but still does not solve the issue with non-uniform output. For some users, maybe this isn't a problem, and for others, maybe it is, but for the majority of users, probably the question has never been posed and analyzed, and nobody thought much about it, probably assuming random is random is random. In other words, the unthinking expectation of most users is likely that the resultant numbers are uniform. So we implement here an efficient way of generating uniform bounded random integers. Through use of compile-time evaluation, and avoiding divisions as much as possible, this commit introduces no measurable overhead. At least for hot-path uses tested, any potential difference was lost in the noise. On both clang and gcc, code generation is pretty small. The new function, get_random_u32_below(), lives in random.h, rather than prandom.h, and has a "get_random_xxx" function name, because it is suitable for all uses, including cryptography. In order to be efficient, we implement a kernel-specific variant of Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. The kernel's variant takes advantage of constant folding to avoid divisions entirely in the vast majority of cases, works on both 32-bit and 64-bit architectures, and requests a minimal amount of bytes from the RNG. Link: https://arxiv.org/pdf/1805.10941.pdf Cc: stable@vger.kernel.org # to ease future backports that use this api Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
author: Jason A. Donenfeld <Jason@zx2c4.com> 2022-10-09 04:42:54 +0200
committer: Jason A. Donenfeld <Jason@zx2c4.com> 2022-11-17 17:36:47 +0100
commit: e9a688bcb19348862afe30d7c85bc37c4c293471 (patch)
tree: 58a32ddf14c44ec5fb12fb7a13c32ae7adaa4455 /include/linux/random.h
parent: kcsan: remove rng selftest (diff)
download: linux-e9a688bcb19348862afe30d7c85bc37c4c293471.tar.xz
linux-e9a688bcb19348862afe30d7c85bc37c4c293471.zip
1 files changed, 40 insertions, 0 deletions
diff --git a/include/linux/random.h b/include/linux/random.h
index 147a5e0d0b8e..3a82c0a8bc46 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -51,6 +51,46 @@ static inline unsigned long get_random_long(void)
 #endif
 }
 
+u32 __get_random_u32_below(u32 ceil);
+
+/*
+ * Returns a random integer in the interval [0, ceil), with uniform
+ * distribution, suitable for all uses. Fastest when ceil is a constant, but
+ * still fast for variable ceil as well.
+ */
+static inline u32 get_random_u32_below(u32 ceil)
+{
+	if (!__builtin_constant_p(ceil))
+		return __get_random_u32_below(ceil);
+
+	/*
+	 * For the fast path, below, all operations on ceil are precomputed by
+	 * the compiler, so this incurs no overhead for checking pow2, doing
+	 * divisions, or branching based on integer size. The resultant
+	 * algorithm does traditional reciprocal multiplication (typically
+	 * optimized by the compiler into shifts and adds), rejecting samples
+	 * whose lower half would indicate a range indivisible by ceil.
+	 */
+	BUILD_BUG_ON_MSG(!ceil, "get_random_u32_below() must take ceil > 0");
+	if (ceil <= 1)
+		return 0;
+	for (;;) {
+		if (ceil <= 1U << 8) {
+			u32 mult = ceil * get_random_u8();
+			if (likely(is_power_of_2(ceil) || (u8)mult >= (1U << 8) % ceil))
+				return mult >> 8;
+		} else if (ceil <= 1U << 16) {
+			u32 mult = ceil * get_random_u16();
+			if (likely(is_power_of_2(ceil) || (u16)mult >= (1U << 16) % ceil))
+				return mult >> 16;
+		} else {
+			u64 mult = (u64)ceil * get_random_u32();
+			if (likely(is_power_of_2(ceil) || (u32)mult >= -ceil % ceil))
+				return mult >> 32;
+		}
+	}
+}
+
 /*
  * On 64-bit architectures, protect against non-terminated C string overflows
  * by zeroing out the first byte of the canary; this leaves 56 bits of entropy.
author	Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-09 04:42:54 +0200
committer	Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-17 17:36:47 +0100
commit	e9a688bcb19348862afe30d7c85bc37c4c293471 (patch)
tree	58a32ddf14c44ec5fb12fb7a13c32ae7adaa4455 /include/linux/random.h
parent	kcsan: remove rng selftest (diff)
download	linux-e9a688bcb19348862afe30d7c85bc37c4c293471.tar.xz linux-e9a688bcb19348862afe30d7c85bc37c4c293471.zip