Chapter 37. Efficient Random Number Generation and ...
Optimizing a SIMD/GPU-friendly random number generator (RNG) involves leveraging parallel processing capabilities to generate multiple random numbers simultaneously. Here are key considerations and strategies for optimization:
Choose an RNG algorithm that is both fast and suitable for parallelization. Common choices include:
Use SIMD (Single Instruction, Multiple Data) instructions to generate multiple random numbers in parallel. For example:
SIMDRegister) to simplify SIMD operations.For GPU optimization, consider the following:
Balance the need for speed with the quality of random numbers. For applications requiring high statistical quality, consider algorithms like Mersenne Twister or Xoshiro. For real-time applications where speed is critical, simpler algorithms like LCG or Park-Miller may be more appropriate.
Here’s a simplified example of vectorizing an LCG using AVX intrinsics:
struct Int8v {
__m256i v;
Int8v(int a) : v{ _mm256_set1_epi32(a) } {}
Int8v operator*(Int8v b) { return _mm256_mul_epi32(v, b.v); }
Int8v operator+(Int8v b) { return _mm256_add_epi32(v, b.v); }
Int8v operator&(Int8v b) { return _mm256_and_si256(v, b.v); }
};
Int8v nextRandom(Int8v state) {
const Int8v A{ 48271 };
auto low = (state & 0x7fff) * A;
auto high = (state >> 15) * A;
state = low + ((high & 0xffff) << 15) + (high >> 16);
return state = (state & 0x7fffffff) + (state >> 31);
}
This example demonstrates how to generate 8 random numbers in parallel using AVX instructions.
For more detailed discussions and implementations, refer to the following resources: