Type trickery below the poverty threshold (ie. in the C++ type system)

It is well documented how broken the type system of C++/Java style languages is. If you are stuck with one of these languages you can take pride that when it comes to types, you are operating in a context that is a step above Python and a few floors above Javascript. Milking the staircase analogy, at the roof of the building would be Rust and in low orbit you may find languages like Haskell and OCaml. That said, there are a few nice things you can do with C++ templates that are not extremely obvious but not too magical either. In this post I will try to describe some that we have found useful at codebender, since type conversions at the C++ level are required to make FireBreath play well with libraries like serial.

Type proxy

The proxy pattern is a very useful way of hinting at the type system how to induce types. Basically the motivation is this: We have a function that takes some values and returns a type. As is common in statically typed languages we can overload this function to have different implementations depending in types of the arguments and the return type. In other words we can have different implementations for different type signatures. To briefly demonstrate:

int addOne (int n) {
    return n + 1;
}

std::string addOne (std::string s) {
    return s + "1";
}

Now the compiler will choose the right implementation based on the type of the signature. Simple. Now overloading can be used a nice metaprogramming tool. I will demonstrate with an example:

#include <vector>
#include <string>

class Box {
public:
    // Just don't let anyone subclass this
    virtual void dummy () = 0;
};

class Integer : public Box {
public:
    Integer(int x=0) : _val(x) {}
    int value () {
        return _val;
    };
private:
    int _val;
};

// Only some numbers are allowed.
class Enumeration : public Box {
public:
    Enumeration(unsigned x=0) {
        if (x > 5) x = 0;
        value = 0;
    }
    unsigned value() {
        if (_val > 5) _val = 0;
        return _val;
    }
private:
    unsigned _val;
};

class Natural : public Box {
public:
    Natural(unsigned x=0) : _val (x) {}
    unsigned value() {
        return _val;
    }
private:
    unsigned _val;
};

class Float : public Box {
public:
    Float(float x=0) : value (x) {}
    float value() {
        return _val;
    }
private:
    float _val;
};

std::vector<Box> getArguments() {
    std::vector<Box> numbers;
    numbers.push_back(Integer(1));
    numbers.push_back(Float(2));

    numbers.push_back(Natural(3));
    numbers.push_back(Enumeration(3));

    return numbers;
}

bool function(int x, float y, unsigned z, unsigned _enum) {
    return true;
}


int main(int argc, char *argv[])
{
    std::vector<Box> args = getArgument();

    return 0;
}

The Box class has 4 subclasses each one for a different type. We use this type to pass around vectors of arguments that we want to pass to C++ functions. Notice how the Natural and Enumerated boxed types both have the underlying type of unsigned.

So our task at this point is to define a function that will easily convert the Box types into C++ expected by a function, in the above example function.

template <typename T>
T convert(Box x) {
    // Code to convert
}

So here we are going to use C++'s the type inference mechanism for templates to have the compiler induce the correct type for the arguments and save us from some typing. But as we noticed above both Natural and Enumerated will translate to unsigned so this information alone isn't enough. We could pass the specific boxed type we need:

template <typename T, typename BT>
T convert(Box x) {
    BT* b = dynamic_cast<BT*>(&x);

    // Handle b being NULL
    return b->value;
}

But in practice when we say convert<Enumerated>(x) the compiler has a hard time figuring out if Enumerated corresponds to T and BT should be induced or the other way round. To avoid this ambiguity we can provide a proxy type as we would if we were using haskell. Here is how:

template <typename T, typename BT>
T convert(Box x, BT* _unused_proxy) {
    BT* b = dynamic_cast<BT*>(&x);

    // Handle b being NULL
    return b->value();
}

The _unused_proxy argument will always be a NULL pointer of type BT. This way we will always be able to explicitly tell convert which boxed type to go through before generating our value without having to also explicitly state the return value. We do this by exploiting the fact that while we don't have access to a type's inhabitants from the type itself, we can cheaply create instances of the type T* for any type T.

Type equality

Now let's talk about something slightly more advanced. There are times when we want the compiler to tell our program whether two types are actually the same. The basic idea is this; we will use overloading and templates to pattern match at the type level and generate a function that always returns true if two types are the same and false if the given types are different.

template<typename T>
struct IsEqual {
    template <typename T1>
    static const bool real_check(T1* a, T1* b) {return true;}

    template <typename T1, typename T2>
    static const bool real_check(T1* a, T2* b) {return false;}

    template <typename U>
    static const bool check () {return real_check((T*)NULL, (U*)NULL);}
};

#define EQUAL_TYPES(t1, t2) IsEqual<t1>::check<t2>()

int main(int argc, char *argv[])
{
    int type = 0;
    std::cout << EQUAL_TYPES(int, int) << std::endl;  // 1
    std::cout << EQUAL_TYPES(int, float) << std::endl;  // 0
    return 0;
}

Here we use the proxy technique discussed above to choose between real_check accepting two arguments of the same type, and real_check accepting arguments of any two types.

Is the given type an object?

Sometimes we want to know if a type is an object or not, ie. not builtin to C++. The problem with non-builtin types is that we can't just sizeof them and then know that the number returned is the number of bytes that completely describes the object (unless we count pointers but there are cases when we don't need to care about them. Consider yourself warned though). We will use a similar technique as above.

template<typename T>
struct IsObject {
    template<typename U>
    static const char true_check(void(U::*)(void));
    template<typename U>
    static const unsigned long true_check(...);

    enum { value = sizeof(true_check<T>(NULL)) == sizeof(char) };
};

#define IS_OBJECT(t1) IsObject<t1>::value

Here we do something similar as before exploiting the fact that the the compiler will be able to make sense of the void(AClass::*)(void) type, even if there are no real functions of that signatures, but not of the void(int::*)(void) type. This way for all object types the compiler will select true_check(void(U::*)(void)) instead of true_check(...).

Now there is another interesting thing about this snippet. Instead of selecting a function that always returns true we don't leave any work to be done at runtime. Since we have all the information we need we define a compile time constant in the form of an enum constant generated at compile time. We select the value based on the size of the type if the return value of true_check. Notice also how we don't even need to provide an implementation for true_check. The declaration is enough. We could have done the same trick in the previous section too but it's nice to know more than one tricks.

class Hello {};
int main(int argc, char *argv[])
{
    std::cout << IS_OBJECT(Hello) << std::endl; // 1
    std::cout << IS_OBJECT(int) << std::endl; // 0
    return 0;
}

Compile time loops

A little known feature of C++ is that they can also take values as arguments. Combined with the enum trick above this means that we one can build values using recursion. Unfortunately the maximum depth for this kind of recursion is 256 frames so the computation one can do in compile time is very limited. We can however find the fibonacci the nth number at compile time:

#include <stdio.h>

template <int N>
struct Fibonacci {
    enum { value = Fibonacci<N-1>::value + Fibonacci<N-2>::value };
};

template < >
struct Fibonacci < 1 > {
    enum { value = 1 };
};
template < >
struct Fibonacci < 0 > {
    enum { value = 1 };
};

int main(int argc, char *argv[])
{
    printf("%d\n", Fibonacci<45>::value);  // 1836311903
    return 0;
}

To prove that the number is correct here is some haskell code that computes the same:

Prelude> let fib = [1,1] ++ zipWith (+) fib (tail fib)
Prelude> fib !! 45
1836311903

To prove that that was actually compile time here is the assembly generated (although it's also self evident since there is no function to call really).

    .section  __TEXT,__text,regular,pure_instructions
    .macosx_version_min 10, 11
    .globl _main
    .align 4, 0x90
_main:                                  ## @main
    .cfi_startproc
## BB#0:
    pushq  %rbp
Ltmp0:
    .cfi_def_cfa_offset 16
Ltmp1:
    .cfi_offset %rbp, -16
    movq   %rsp, %rbp
Ltmp2:
    .cfi_def_cfa_register %rbp
    subq   $32, %rsp
    leaq   L_.str(%rip), %rax
    movl   $1836311903, %ecx       ## imm = 0x6D73E55F
    movl   $0, -4(%rbp)
    movl   %edi, -8(%rbp)
    movq   %rsi, -16(%rbp)
    movq   %rax, %rdi
    movl   %ecx, %esi
    movb   $0, %al
    callq  _printf
    xorl   %ecx, %ecx
    movl   %eax, -20(%rbp)         ## 4-byte Spill
    movl   %ecx, %eax
    addq   $32, %rsp
    popq   %rbp
    retq
    .cfi_endproc

    .section   __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz "%d\n"


.subsections_via_symbols

This may not sound like much given the 256 stack size limitation but it may be possible to move to compile time some simple computations in tight loops.

Conclusion

Writing javascript all day it is refreshing to have even a primitive type system like C++'s help out every once in a while. C++11, while still quite dumb from a type theoretical standpoint, provides even more nice metaprogramming features.