Saturday 17 September 2016

A Lightweight Reflection System: Part I

In my previous post, I mentioned one of the benefits of using C++ as a scripting language being that you can entirely avoid the need for a reflection system. Nonetheless, reflection can greatly enhance the flexibility of any code workflow, also yielding a more powerful scripting system. In this post, I'll be detailing the first part of a lightweight reflection layer, largely automated and with ease-of-use in mind. For anyone looking for a pre-existing reflection library, I think Ponder looks really good. I however seem to have a thing for wanting to write tiny systems that keep end-user dependencies to a minimum. The end goal is to be able to achieve code like with a handful of source file dependencies:

nano::type id    = Refl(myobj).TypeID();
const char* name = Refl(myobj).Typename();
printf( "%s : %d", name, id );
Refl(myobj).Call( "SetPosition", vec3(10, 0, 10) );

...you catch the drift.


DataProperty


First we'll start with a type-erasure class to facilitate passing data and arguments between script and host application - or even across the application's sub-systems - with embedded type safety mechanisms. As you might have guessed, this is what constitutes the signature of the Refl::Call() method previewed above.

DataProperty Refl::Call( const char* method, DataProperty& arg1 )
DataProperty Refl::Call( const char* method, DataProperty& arg1, DataProperty& arg2 ) ...etc

But more on th3 Refl class in the next post.


Type Erasure


At the barest level, achieving type erasure is pretty simple:

class DataProperty
{
public:
template<typename T> DataProperty(T& data) : m_data(&data)
{}
template<typename T> operator T&() { return *(T*)m_data; }
private:
void* m_data;
};

Usage
float val1 = 4;
DataProperty data = val1;
float val2 = data;


In practice, the above class would work for all Lvalue objects. But it should really be able to handle non-persistent data to avoid stray pointers. Achieving this would involve making a copy of the data, meaning implicitly knowing the data size. The class members now become:

uintptr_t m_data;
unsigned m_size;

The switch from void* to uintptr_t is so we can intuitively store numeric values in m_data; (a union can be used as well). The initialisation paths of m_data can be split into three groups:

  • Numeric/Small Types Or Pointers: m_data = p_data; // store value directly. any type smaller than sizeof(uintptr_t) falls into this category.
  • Reference Types: m_data = &p_data // store address of object.
  • Big Value Types: m_data = new T(p_data) // store pointer to 'copy constructed' value.
The usage of new in the last case is a bit misleading as in practice we will only support trivially copyable types for simplicity; so we can just malloc and free custom data without having to worry about properly calling constructors or destructors. It's up to the reader to extend the class to support non-trivial types if they choose to - I've found this limitation to be perfectly acceptable for all purposes I can think of.

We fall to SFINAE techniques and type_traits to adequately assign m_data based on which group the assigned type falls into. There's however one more essential flaw that needs to be addressed; type safety. This is where things become significantly more involved - safety ain't cheap.


Type Identification, without RTTI


Recall that RTTI support is stripped from the scripting engine in my previous post to achieve portability and significantly smaller executables. Moreover it's generally stripped from performance critical applications like games to discourage the use of dynamic_cast; among other reasons. Therefore a compile-time solution is the best way to go. Most compile-time/non-RTTI typeid techniques are based on either explicit user-side declaration or automated generation methods reliant on the uniqueness (or addresses) of static variables and functions. A simple automated method just needs a template like:

template <typename T>
uintptr_t Type_ID()
{
static char sTypeID;
return &sTypeID;
}

This works really well - with ample room for extension into a partially explicit system via template specialisation (to support polymorphism for instance). However it falls short when it comes to usage across dlls and a host application. This is due to its reliance on the address of static variables or functions, as each dynamic library would hold a unique copy of the statics, meaning different addresses and ultimately different type_ids for the same type depending on where Type_ID<T>() is called from.

After some experimentation, including a foolishly futile attempt to defer the call to Type_ID<T>() by wrapping it in abstract objects; I realised I'd been overlooking a preprocessor indispensable to debugging & profiling systems: __FUNCTION__ (or __PRETTY_FUNCTION__ in GCC).


Calling __FUNCTION__ within the int version of Type_ID<T> for instance would evaluate to "Type_ID<int>". Using this knowledge, we can auto-generate ids that are consistent across whatever execution space we're running; and it also opens up possibilities for getting the actual typename with some minor string parsing - which is skipped here for conciseness.


template <typename T>
unsigned Type_ID()
{
static unsigned sTypeID(0);
if(sTypeID == 0)
sTypeID = some_char_string_hash32_function(__FUNCTION__); // as usual, FNV1 or CRC32 would suffice here.
return sTypeID;
}


Bringing it all together


With all this in place, we can now finalise our DataProperty class. First the SFINAE m_data assignment will be outsourced to external template functions as C++ understandably doesn't support partial specialisation of template class methods.

namespace utils
{
template <typename T = void> using if_pointer_t = typename std::enable_if< std::is_pointer<T>::value, T>::type;
template <typename T = void> using if_value_t = typename std::enable_if< !std::is_pointer<T>::value, T>::type;

template<typename T> void construct(uintptr_t& dest, unsigned& dest_size, T& p_data);

template<typename T> void construct(uintptr_t& dest, unsigned& dest_size, if_pointer_t<T>& p_data)
{
dest = (uintptr_t)p_data;
dest_size = 0; // make pointers zero-sized for identification/type-checking
}

template<typename T> void construct(uintptr_t& dest, unsigned& dest_size, if_value_t<T>& p_data)
{
const unsigned size = sizeof(p_data);
if (size > sizeof(uintptr_t))
{
static_assert( std::is_trivially_copyable<T>::value, "unsupported for non-trival types" );
dest = (uintptr_t)malloc(size);
memcpy_s((void*)dest, size, &p_data, size);
}
else
{
(T&)dest = p_data;
}
dest_size = size;
}
}


class DataProperty
{
public:
template<typename T> DataProperty(T& data)
{
utils::construct<T>(m_data, m_size, p_data);
m_typeid = Type_ID<T>();
}


~DataProperty()
{
if (m_size > sizeof(uintptr_t))
free((void*)m_data);
}

template<typename T> operator T&()
{
if(Type_ID<T> != m_typeid)
throw "incompatible types";
return *(T*)m_data;
}
private:
uintptr_t m_data;
uintptr_t m_typeid;
unsigned m_size;
};


In the next post of this series, I'll be discussing the rest of the reflection system. This has skipped a few important details about DataProperty; for instance, we currently can't read a pointer-initialised m_data as a value - it must be read as a pointer. Typically because Type_ID<T>() has a different signature to Type_ID<T*>(). It'll also be nice to safely make copies of DataProperty and support reference type initialisation. This will all be covered in the final source code that'll be made available in the next of the series. Peace.

No comments:

Post a Comment