It is well documented how broken the type system of C++/Java style languages is. If you are stuck with one of these languages you can take pride that when it comes to types, you are operating in a context that is a step above Python and a few floors above Javascript. Milking the staircase analogy, at the roof of the building would be Rust and in low orbit you may find languages like Haskell and OCaml. That said, there are a few nice things you can do with C++ templates that are not extremely obvious but not too magical either. In this post I will try to describe some that we have found useful at codebender, since type conversions at the C++ level are required to make FireBreath play well with libraries like serial.
Type proxy
The proxy pattern is a very useful way of hinting at the type system how to induce types. Basically the motivation is this: We have a function that takes some values and returns a type. As is common in statically typed languages we can overload this function to have different implementations depending in types of the arguments and the return type. In other words we can have different implementations for different type signatures. To briefly demonstrate:
int addOne (int n) {
return n + 1;
}
std::string addOne (std::string s) {
return s + "1";
}
Now the compiler will choose the right implementation based on the type of the signature. Simple. Now overloading can be used a nice metaprogramming tool. I will demonstrate with an example:
#include <vector>
#include <string>
class Box {
public:
// Just don't let anyone subclass this
virtual void dummy () = 0;
};
class Integer : public Box {
public:
Integer(int x=0) : _val(x) {}
int value () {
return _val;
};
private:
int _val;
};
// Only some numbers are allowed.
class Enumeration : public Box {
public:
Enumeration(unsigned x=0) {
if (x > 5) x = 0;
value = 0;
}
unsigned value() {
if (_val > 5) _val = 0;
return _val;
}
private:
unsigned _val;
};
class Natural : public Box {
public:
Natural(unsigned x=0) : _val (x) {}
unsigned value() {
return _val;
}
private:
unsigned _val;
};
class Float : public Box {
public:
Float(float x=0) : value (x) {}
float value() {
return _val;
}
private:
float _val;
};
std::vector<Box> getArguments() {
std::vector<Box> numbers;
numbers.push_back(Integer(1));
numbers.push_back(Float(2));
numbers.push_back(Natural(3));
numbers.push_back(Enumeration(3));
return numbers;
}
bool function(int x, float y, unsigned z, unsigned _enum) {
return true;
}
int main(int argc, char *argv[])
{
std::vector<Box> args = getArgument();
return 0;
}
The Box
class has 4 subclasses each one for a different type. We use this type to pass around vectors of arguments that we want to pass to C++ functions. Notice how the Natural
and Enumerated
boxed types both have the underlying type of unsigned
.
So our task at this point is to define a function that will easily convert the Box
types into C++ expected by a function, in the above example function
.
template <typename T>
T convert(Box x) {
// Code to convert
}
So here we are going to use C++'s the type inference mechanism for templates to have the compiler induce the correct type for the arguments and save us from some typing. But as we noticed above both Natural
and Enumerated
will translate to unsigned
so this information alone isn't enough. We could pass the specific boxed type we need:
template <typename T, typename BT>
T convert(Box x) {
BT* b = dynamic_cast<BT*>(&x);
// Handle b being NULL
return b->value;
}
But in practice when we say convert<Enumerated>(x)
the compiler has a hard time figuring out if Enumerated
corresponds to T
and BT
should be induced or the other way round. To avoid this ambiguity we can provide a proxy type as we would if we were using haskell. Here is how:
template <typename T, typename BT>
T convert(Box x, BT* _unused_proxy) {
BT* b = dynamic_cast<BT*>(&x);
// Handle b being NULL
return b->value();
}
The _unused_proxy
argument will always be a NULL
pointer of type BT
. This way we will always be able to explicitly tell convert
which boxed type to go through before generating our value without having to also explicitly state the return value. We do this by exploiting the fact that while we don't have access to a type's inhabitants from the type itself, we can cheaply create instances of the type T*
for any type T
.
Type equality
Now let's talk about something slightly more advanced. There are times when we want the compiler to tell our program whether two types are actually the same. The basic idea is this; we will use overloading and templates to pattern match at the type level and generate a function that always returns true if two types are the same and false if the given types are different.
template<typename T>
struct IsEqual {
template <typename T1>
static const bool real_check(T1* a, T1* b) {return true;}
template <typename T1, typename T2>
static const bool real_check(T1* a, T2* b) {return false;}
template <typename U>
static const bool check () {return real_check((T*)NULL, (U*)NULL);}
};
#define EQUAL_TYPES(t1, t2) IsEqual<t1>::check<t2>()
int main(int argc, char *argv[])
{
int type = 0;
std::cout << EQUAL_TYPES(int, int) << std::endl; // 1
std::cout << EQUAL_TYPES(int, float) << std::endl; // 0
return 0;
}
Here we use the proxy technique discussed above to choose between real_check
accepting two arguments of the same type, and real_check
accepting arguments of any two types.
Is the given type an object?
Sometimes we want to know if a type is an object or not, ie. not builtin to C++. The problem with non-builtin types is that we can't just sizeof
them and then know that the number returned is the number of bytes that completely describes the object (unless we count pointers but there are cases when we don't need to care about them. Consider yourself warned though). We will use a similar technique as above.
template<typename T>
struct IsObject {
template<typename U>
static const char true_check(void(U::*)(void));
template<typename U>
static const unsigned long true_check(...);
enum { value = sizeof(true_check<T>(NULL)) == sizeof(char) };
};
#define IS_OBJECT(t1) IsObject<t1>::value
Here we do something similar as before exploiting the fact that the the compiler will be able to make sense of the void(AClass::*)(void)
type, even if there are no real functions of that signatures, but not of the void(int::*)(void)
type. This way for all object types the compiler will select true_check(void(U::*)(void))
instead of true_check(...)
.
Now there is another interesting thing about this snippet. Instead of selecting a function that always returns true we don't leave any work to be done at runtime. Since we have all the information we need we define a compile time constant in the form of an enum
constant generated at compile time. We select the value based on the size of the type if the return value of true_check
. Notice also how we don't even need to provide an implementation for true_check
. The declaration is enough. We could have done the same trick in the previous section too but it's nice to know more than one tricks.
class Hello {};
int main(int argc, char *argv[])
{
std::cout << IS_OBJECT(Hello) << std::endl; // 1
std::cout << IS_OBJECT(int) << std::endl; // 0
return 0;
}
Compile time loops
A little known feature of C++ is that they can also take values as arguments. Combined with the enum
trick above this means that we one can build values using recursion. Unfortunately the maximum depth for this kind of recursion is 256 frames so the computation one can do in compile time is very limited. We can however find the fibonacci the nth number at compile time:
#include <stdio.h>
template <int N>
struct Fibonacci {
enum { value = Fibonacci<N-1>::value + Fibonacci<N-2>::value };
};
template < >
struct Fibonacci < 1 > {
enum { value = 1 };
};
template < >
struct Fibonacci < 0 > {
enum { value = 1 };
};
int main(int argc, char *argv[])
{
printf("%d\n", Fibonacci<45>::value); // 1836311903
return 0;
}
To prove that the number is correct here is some haskell code that computes the same:
Prelude> let fib = [1,1] ++ zipWith (+) fib (tail fib)
Prelude> fib !! 45
1836311903
To prove that that was actually compile time here is the assembly generated (although it's also self evident since there is no function to call really).
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 11
.globl _main
.align 4, 0x90
_main: ## @main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
subq $32, %rsp
leaq L_.str(%rip), %rax
movl $1836311903, %ecx ## imm = 0x6D73E55F
movl $0, -4(%rbp)
movl %edi, -8(%rbp)
movq %rsi, -16(%rbp)
movq %rax, %rdi
movl %ecx, %esi
movb $0, %al
callq _printf
xorl %ecx, %ecx
movl %eax, -20(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $32, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "%d\n"
.subsections_via_symbols
This may not sound like much given the 256 stack size limitation but it may be possible to move to compile time some simple computations in tight loops.
Conclusion
Writing javascript all day it is refreshing to have even a primitive type system like C++'s help out every once in a while. C++11, while still quite dumb from a type theoretical standpoint, provides even more nice metaprogramming features.