Adventures in CUDA Path Tracing: Part 1

I thought I'd have a go at implementing some path tracing in CUDA. Let's start simple: a classical path tracer with explicit direct lighting. Lots of hacks:

Some initial kernel stats and performance on my lowly GeForce 9500 GT at 512x512 for a single ray per pixel:

I've no idea how many rays/s I get for 2+ bounces since many rays will have terminated by then I haven't put any debug counters in for this. Note the big increase in register count at 1 bounce for adding a classic path tracing loop to the kernel.

Performance for this simple scene is not good. My occupancy is an awful 17% for each of the kernels. This obviously needs improving, I'm way over the register limit though. To get to 50% occupancy, I need to get down to 20 registers. To get to 100%, down to 10 registers. Switching to a more CUDA-friendly ray/triangle test will probably help a bit, but this isn't going to perform miracles. The problem is the kernel structure itself: it's trying to do everything in one loop. From reading the very nice CUDA-related papers from this years SIGGRAPH, I realise that I'd have to move to some job-based system eventually, but I found it surprising to suffer from register problems at such low complexity.

More on this topic soon (I don't get paid to experiment with CUDA sadly). In the meantime, here are some "novel viewpoint" shots of the 2-bounce kernel as it accumulates 1/16/512 rays per pixel (you can see just how much my RNG sucks):

1 ray per pixel

16 rays per pixel

512 rays per pixel