Archive for the ‘Rendering’ Category
CUDA Mersenne Twister
I needed a random number generator for a CUDA project, and had relatively few requirements:
- It must have a small shared memory footprint
- It must be suitable for Monte Carlo methods (i.e. have long period and minimal correlation)
- It must allow warps to execute independently when generating random numbers
There seem to be two main approaches to RNG in CUDA:
- Each thread has its own local history, operates independently. This can be seen in the Mersenne Twister sample in the CUDA SDK (which has a very short history of 19 values). This usually requires an expensive offline process to seed each thread appropriately to avoid correlation. I can’t spare the registers or local memory for this approach.
- Have a single generator per thread block, parallelise the update between all threads and synchronise using __syncthreads. This is the approach in the recent MTGP CUDA sample. I can’t use this approach because I am allowing each warp in the block to process jobs independently (using persistent threads) – calls to __syncthreads to synchronise every thread in the block are not possible.
What I ended up with is basically a modified version of MTGP (the second approach above), but with each warp able to grab random numbers independently from the shared MT state. This had the nice side-effect of reducing the shared memory footprint to be the same as the equivalent CPU MT implementation. Read the rest of this entry »
Adventures in CUDA Path Tracing: Part 1
I thought I’d have a go at implementing some path tracing in CUDA. Let’s start simple: a classical path tracer with explicit direct lighting. Lots of hacks:
- No BVH yet, every ray tests the 30 triangles of the Cornell Box
- Every surface is lambertian (so cosine weighted hemisphere sampling for spawning rays)
- Hardcoded for a single area light (which the camera cannot see)
- Uses copy-pasted Moller intersection test from CPU code
- Random number generation got moved to a texture read (with the texture data updated CPU-side) to avoid absurd register counts
Convergence
I’m extremely excited about the results of Understanding the Efficiency of Ray Traversal on GPUs, and the related work by NVIDIA on ray traversal. In a programming way of course.
There’s this interesting paradigm shift from a strongly geometric grid model to one where we have persistent threads running small kernels (or actually large kernels due to the way CUDA code is currently linked) and grabbing their own jobs asynchronously. The interesting thing about this shift is that this is the way PS3 developers on Cell have been writing SPU job systems for years. Now I admit that the underlying hardware is radically different (massive hardware threading and wide SIMD vs no hardware threading and more conventional SIMD), but the same simple primitives of a resident kernel using atomic increment to grab from a shared job list still apply. I have no idea where this programming model is going to converge, but I think it certainly looks like it is.
(Atomic increment is actually only CUDA compute 1.1, so even your 1 year old laptop with an NVIDIA mobile chipset can probably run this sort of code. Of course it’s nicer with the 1.3 voting primitives, but you can emulate these through shared memory, so no need to go bargain hunting for a GTX 260 just yet.)
Metropolis Light Transport
Here are a collection of papers/links on the topic of Metropolis Light Transport (MLT). The core principle of detailed balance that underpins the Metropolis-Hastings algorithm is extremely neat, and its application to light transport (in particular using Veach’s path integral formulation) is very aesthetically pleasing. This post doesn’t really go anywhere, just provides links for further reading. Read the rest of this entry »
Multiple Importance
At work I wrote a global illumination system from scratch. It used classical ray tracing for the direct lighting, and photon mapping with final gather for the indirect term. I use the past tense since we’ve now switched over to using lightcuts as the main renderer, which due to the work of an awesome colleague, is giving us better results (and faster).
To complete the set, I thought I’d have a go at implementing a bidirectional path tracer, a “full Veach“, if you will… Read the rest of this entry »
Gamma-Correct Rendering
With consumer-level hardware now capable of rendering high dynamic range image data, the days of the 8-bit sRGB framebuffer are numbered. Programmers of next-generation graphics devices are able to model lighting systems to high accuracy, then tone-map these values into a displayable range for conventional 8-bit sRGB equipment, such as PC monitors.
The graphics pipeline from source art to final output is complicated, and requires the programmer to work in several different colour spaces along the way. In this article I’ll give a brief overview of colour spaces, and then detail a commonly overlooked area in the texture pipeline where gamma is important.
Read the rest of this entry »
Continuous Silhouettes
Silhouettes are commonly used for real-time shadowing algorithms. Usually these are generated from the existing edges of a mesh using the face normals. Since shading is usually interpolated over the triangle from the vertex normals, this can introduce shading artifacts where the vertex and face normals do not agree. In addition, these silhouettes move discontinuously when the light or mesh is in motion, which can cause nasty popping artifacts when using penumbra wedge soft shadows, since the projected penumbra volumes are very sensitive to the distance from the silhouette edge to the light source.
This post describes an idea of how to generate silhouette geometry using the vertex normals of a mesh. These silhouettes match the vertex lighting exactly, and also move continuously under smooth lighting or geometry changes.
Read the rest of this entry »
How To Fix The DirectX Rasterisation Rules
DirectX has always rasterised to render target pixel centres, and has always looked up textures from texel edges. Because of this, it is more difficult than it should be to write a flexible shader system where various portions of your pipeline are image-based. It is much simpler to work in a unified system where you write to render target pixel edges, and this article details a simple way of fixing this.
Read the rest of this entry »
Really Old Demos – The Buggy Demo
This demo is just here for posterity – it formed part of my August 2002 coding portfolio when I was first applying for jobs in the games industry. Be warned, there’s some seriously dodgy coding going on here…
Read the rest of this entry »
Really Old Demos – Cubic Shadow Mapping
This demo is just here for posterity – it formed part of my August 2002 coding portfolio when I was first applying for jobs in the games industry. Be warned, there’s some seriously dodgy coding going on here…