Archive for December, 2009
I needed a random number generator for a CUDA project, and had relatively few requirements:
- It must have a small shared memory footprint
- It must be suitable for Monte Carlo methods (i.e. have long period and minimal correlation)
- It must allow warps to execute independently when generating random numbers
There seem to be two main approaches to RNG in CUDA:
- Each thread has its own local history, operates independently. This can be seen in the Mersenne Twister sample in the CUDA SDK (which has a very short history of 19 values). This usually requires an expensive offline process to seed each thread appropriately to avoid correlation. I can’t spare the registers or local memory for this approach.
- Have a single generator per thread block, parallelise the update between all threads and synchronise using __syncthreads. This is the approach in the recent MTGP CUDA sample. I can’t use this approach because I am allowing each warp in the block to process jobs independently (using persistent threads) – calls to __syncthreads to synchronise every thread in the block are not possible.
What I ended up with is basically a modified version of MTGP (the second approach above), but with each warp able to grab random numbers independently from the shared MT state. This had the nice side-effect of reducing the shared memory footprint to be the same as the equivalent CPU MT implementation. Read the rest of this entry »