DXT Compression Techniques

This article presents an explanation of two techniques that can be used to perform DXT colour compression. They were designed during the development of an open source DXT compression library called squish.

What Is DXT Compression?

DXT compression is a very well-designed compressed image format for colour images with an optional alpha channel. The official spec can be downloaded from here.

This article will deal with colour compression as this is the most difficult part of writing a DXT compressor.

The DXT Colour Block

DXT is a block compression scheme, where each block encodes a 4x4 set of pixels. Each block is compressed by picking a start and end colour at 565 precision (that is, 5 bits for red, 6 for green and 5 for blue) and considering up to two full-precision intermediate colours. Each pixel in the block is then encoded with a 2-bit index into these 4 colours.

This allows 16 pixels to be encoded in 8 bytes of data, giving either 6:1 or 8:1 compression (depending on whether you are including alpha).

Principal Components

Each colour in the block can be considered as a point in a 3-dimensional space of (red, green, blue). The compressed colours we use to represent this block must lie along a straight line through this space, the direction of which must well-capture the variation in the block.

A technique called principal component analysis can find the direction along which the points vary the most. This direction, called the principal axis, is likely to be very close to the direction of the line through the compressed endpoints, so we use this to bootstrap two different methods to find good endpoints.

Compression Technique 1: Range Fit

If we take the minimum and maximum point along the principal axis, we can use these as the endpoints directly. Although there may be better endpoints that were not part of the original point set, this algorithm avoids a length search process for better endpoints.

Uncompressed Image (Zoomed x3)

Above is the 128x128 source image, zoomed up by 3x. Below is the same image compressed using DXT1 and the range fit algorithm.

Range Fit Compressed Result (Zoomed x3)

The range fit algorithm produces an RMS error of 8.429.

The pseudo-code for this algorithm is as follows:

n = ComputePrincipleAxis()
a, b = ComputeExtremePointsOnAxis( n )
ComputeDxtPointsFromEndPoints( a, b )
for each point:
    index[point] = GetNearestDxtPoint();

A possible improvement to this algorithm would be to iteratively try to improve the endpoints using a local search of nearby colours. Although this can indeed improve the results it is difficult to find a global optimum, as the closest index for each point can change during the search. A better approach is to take advantage of the low number of intermediate points, and this will be discussed next.

Compression Technique 2: Cluster Fit

The previous technique took endpoints from the original colour set and fitted a set of indices to the resulting compressed colours. This technique turns this process around, using an existing set of indices to decide where to place the endpoints.

Testing all possible indices would take far too long, but we can use the principal axis to reduce this set down immensely. If we assume that the principal axis is very similar to the direction of the line through optimal endpoints, we can also assume that a total ordering of the original colour set in these directions is also very similar.

So this technique uses the principal axis to define a total ordering on the original colour set. We then test all possible ways of clustering the original points that preserve this total ordering, and fit endpoints to each generated index set using least squares.

Uncompressed Image (Zoomed x3)

Above is the 128x128 source image, zoomed up by 3x. Below is the same image compressed using DXT1 and the cluster fit algorithm.

Cluster Fit Compressed Result (Zoomed x3)

The cluster fit algorithm produces an RMS error of 6.79. The same image compressed with the NVIDIA tool (nvdxt v7.33) produces an RMS error of 7.06.

n = ComputePrincipleAxis()
ordering = ComputeTotalOrderingFromAxis( n )
best = null
for each clustering that preserves ordering:
    indices = GetIndicesFromClustering( clustering )
    block = LeastSquaresFitDxtBlockUsingIndices( indices )
    if error( block ) < error( best ):
        best = block

A possible improvement to this algorithm would be to test the total ordering using the best endpoints against the one from the principal axis. If they are different, we could run the algorithm again to try and reduce the error. This extension has not yet been tested.

Update: this has been implemented in squish 1.9. For the images tested, the algorithm iterates around 1.6 times per block.

Single Colour Compression

The squish library also contains a provably optimal single colour compressor. This is driven by lookup tables that give an optimal start and end point for each index of 3 or 4-colour DXT colour compression. The pseudo-code for this algorithm and an n-colour block is:

tables = GetLookupTables( n )
best = null
for index in range( 0, n - 1 ):
    block = LookupBlockEndpointsForIndex( index )
    if error( block ) < error( best ):
        best = block

The lookup tables gives the 5 or 6 bit endpoints that generate the closest point to the target for the given index. These are generated by lookup over all possible 5 and 6 bit endpoints and storing the exact intermediate colours they generate. The gaps are then filled in using the closest match. Since any target point must share the index between all components of the colour, simply testing each index for lowest overall error garuantees we get an optimal encoding.

Testing 1000 random target colours and DXT1 compression using the range fit algorithm generates the following statistics:

min = 0, max = 6, rms = 3.58148

By using the single colour lookup tables, the same colours can be compressed with the following statistics:

min = 0, max = 1.73205, rms = 0.930054

So the optimal rms error is under 1 bit, which is fairly impressive given that the endpoints are either 2 or 3 bits less precise than the target point.

Source Code

The algorithms in this article have been implemented and optimised during the development of the squish library.