Simon's Graphics Blog

Work log for ideas and hobby projects.

Hybrid Bidirectional Path Tracing

with 3 comments

I’d like to share my results from converting a CPU-only bidirectional path tracer into a CPU/GPU hybrid (CPU used for shading and sampling, GPU used for ray intersections). These results are a bit old… I posted them a while ago as a thread on ompf. I found out later that this thread had been cited in Combinatorial Bidirectional Path-Tracing for Efficient Hybrid CPU/GPU Rendering, so let me summarise it here.

Bidirectional path tracing is usually implemented with a nested loop over the vertices of an eye and light sub-path, with inline calls to traverse the scene for closest hit rays and shadow rays. To get large numbers of rays in flight, I changed this into a state machine that yields when it needs to fire a ray, and sometime later once the intersection results are available, restores its state to continue shading/sampling. I then implemented a basic job system that spreads 64K of these state machines over available CPU cores in small groups. This allows me to build up 64K ray requests with all the shading and sampling done on the CPU.

The GPU side traces 64K rays through the scene, returning 64K intersection results. For simplicity, I am using Optix (1.x at the time) for the traversal.

The whole system is then double buffered, allowing me to build up one batch of rays split over N CPUs while the previous batch is being traced by the GPU.

Here’s a shot of it after a few seconds (hence the noise) with the profiler up:

Profiling a Hybrid Bidirectional Path Tracer

The top 4 cyan rows are a Core 2 Quad (Q6600 @2.4GHz). The lower yellow row is a GTX275. The single magenta job is accumulating samples to avoid sample collisions during shading.

Some observations:

  • My CPU is the bottleneck by a long way.
  • Optix spinlocks a CPU while the GPU is busy. I assume this is by design: CUDA lets you choose min latency spinlocking vs high latency OS event in recent releases, it looks like Optix always uses the former whereas I’d prefer the latter here.
  • The first yellow bar is memcpy, moving rays and intersection results over PCIe. I’d like to try using page-locked host memory to avoid this, but it’s not exposed in Optix.

The authors of Combinatorial Bidirectional Path-Tracing for Efficient Hybrid CPU/GPU Rendering also noticed this CPU bottleneck, so they increased the relative load on the GPU by using correlated sampling of a small set of light sub-paths. This is an interesting performance/coverage trade-off that they then discuss how to optimise for.

Here is a 1024×1024 render of the same scene after 10 or so minutes to clear up most of the noise:

Hybrid Bidirectional Path Tracer

And another piece of serious coder art, using a thin lens camera model and and experimenting with unbiased methods to guide light sub-paths towards refractive objects to get caustic paths in larger outdoor scenes:

Hybrid Bidirectional Path Tracer

While an interesting side project, I don’t think I’ll continue with this approach. I think if you’re considering splitting work between CPU and GPU, I’d advise splitting in a way that allows dynamic load balancing, such as using OpenCL on both devices and splitting up the framebuffer. For me, I’m going to continue to focus on rendering primarily using the GPU.

Written by Simon Brown

March 21st, 2011 at 9:14 pm

3 Responses to 'Hybrid Bidirectional Path Tracing'

Subscribe to comments with RSS or TrackBack to 'Hybrid Bidirectional Path Tracing'.

  1. When you say rendering primarily on the GPU do you mean that you will do your path tracer completely on the GPU?

    I’m considering moving my GPU only baker to a hybrid approach because the bandwidth for bringing around all the material data is getting to be pretty large(I store 27 SH coeffs per vert + extra data). I’d like to do integration on the CPU and use batched GPU ray casts for intersection test.

    This also has the benefit of being a lot easier for me to debug and optimize the integration process because its on the CPU and I already have a nice multicore PC library.

    I’m also using Optix. Did you move away from optix for the tracing and go with CUDA?

    Thanks!
    -= Dave

    David Neubelt

    26 Mar 11 at 9:31 pm

  2. Yeah there’s still a lot of stuff I want to try that has at least shading and traversal on the GPU. The tracer in two-way path tracing was pure CUDA, but I get very similar throughput using Optix. Right now I’m back to using Optix just for ease of prototyping.

    I agree that performing only traversal on the GPU makes for a nicer coding and debugging environment, but I found that for path tracing it wasn’t that much faster than CPU-only (which makes things even easier). Of course results may vary based on how much of your algorithm is traversal (as noted in the paper above). I found that moving the whole path tracer to the GPU gets me around 10x the performance. I should also concede that this is prototyping environment where I don’t have much spinup/spindown time.

    Simon Brown

    27 Mar 11 at 9:24 am

  3. I guess the only way for me to find out if its going to be faster to do the shading/integration on the CPU and ray intersection on the GPU versus all GPU is to try it both ways. I’ve already got it working with it being all the the GPU so now its time to try the hybrid approach.

    Ugg. :)

    If you’re interested then shoot me an email and I’ll send you the results when I’m done.

    David Neubelt

    27 Mar 11 at 11:04 pm

Leave a Reply