Why You Should Always Use static_cast

1 May, 2004 #c

I'm writing this to document my experience with the use of casting in large class hieararchies, specifically those that use multiple inheritance. I'll be mainly having a go at anyone who likes to use C-style casts to move around their class hierarchy. :)

First I'll detail the correct way to traverse between multiple base classes, then we'll have a look at what the compiler generates to implement multiple inheritance. Finally we'll see what C-style casts do in this implementation, and discover a whole new world of pain.

If You've Ever Used The ATL, You Should Already Know This

Let's consider a simple case of a class that implements two interfaces:

// first base class
class A
{
public:
    virtual void f() = 0;
};

// second base class
class B
{
public:
    virtual void g() = 0;
};

// derived class
class C : public A, public B
{
public:
    void f() { std::cout << "f" << std::endl; }
    void g() { std::cout << "g" << std::endl; }
};

Now consider we have a pointer to an instance of C through the interface of A, and we wish to obtain a pointer through the interface of B. A and B are unrelated types, so in order to do so we must cast through a pointer to class C.

A *a = new C;
B *b = static_cast<C *>(a);

This lets the compiler know what route you are taking through the hierarchy, so it can properly convert between pointer types.

A slightly more contrived example is a diamond-shaped class hierarchy. Consider the following code:

// common base class
class D
{
public:
    virtual void h() = 0;
};

// first intermediate class
class E : public D
{
public:
    void h() { std::cout << "E::h" << std::endl; }
};

// second intermediate class
class F : public D
{
public:
    void h() { std::cout << "F::h" << std::endl; }
};

// derived class
class G : public E, public F {};

Now consider we wish to cast from a pointer to class G to a pointer to class D. This is a problem because class G actually contains two instances of class D (since we didn't inherit virtually), so you have to tell the compiler which version you want. As an example, if you need the implementation contained in class E you need to cast using the following code:

G *g = new G;
D *d = static_cast<E *>(g);

Pointer Values Can Change During Casts

When using a base class pointer, the compiler must be able to access the base data and virtual methods regardless of which derived class we actually have an instance of. For this reason the virtual function table pointer(s) and data for a given class is usually allocated as a contiguous block within a derived class memory layout. This local block is the same for all classes that inherit this base class.

Hence for classes that use multiple inheritance, pointer values must change for certain routes through the class hierarchy. This is usually the case for all base classes apart from the first one, but compilers are free to implement this as they please.

The saving grace is that using static_cast will force the compiler to work out a unique route through the class hierarchy, if no route can be found you will get a compile-time error. Drawing a diagram to identify the junctions you need to tell it about is usually simple though, and the answer is usually just to static_cast to one place en route and the compiler will work out the rest.

C-Style Casts Are reinterpret_cast Operators (In General), And Hence Evil

A C-style cast performs the best cast it can given the information at the point the cast is used. The result is either a static_cast or reinterpret_cast (with possibly const_cast evilness aftwards), but which one is used is left unspecified. So if there is non-contradictory, complete information to connect the two types then static_cast will be used, otherwise reinterpret_cast will be used with no warnings! A simple example is the following:

class Base;
class Derived; // derived from Base

class Foo
{
public:
    // assume we know we have a derived pointer
    Derived *GetBasePtr() { return (Derived *)m_base; }

private:
    Base *m_base;
};

The coder who wrote the above may know that Derived is derived from Base, but the compiler (at this point) does not, so it will silently use a reinterpret_cast to implement this cast expression. In general this will break your code (since pointer values can change during casts) and there will be much head-scratching and keyboard-destroying while you wonder why your compiler seems to have generated fundamentally broken code without so much as a warning. If a static_cast expression had been used above, the compile would have failed, since the compiler does not know anything of the route through the class hierarchy to connect Base to Derived.

To further complicate matters, on most compilers if Base contained a virtual method and Derived only derived from Base then the above code would probably work. Only when Derived inherited from another class, or perhaps if you changed compiler version, would the above code stop working. Tracking down this sort of bug is virtually impossible since no warnings are ever produced.

Conclusion

If you are serious about producing C++ code that does not exploit compiler-specific evilness and works when you need to implement interfaces from more than one class hierarchy at a time, you will never use a C-style cast again. You'll also never store anything in void* pointers, since unless you reinterpret_cast back to the exact same class you've lost your pointer again.