nanoFracture

Saturday, 17 September 2016

Templates and Build Times

Quick iteration is probably one of the most important recipes for successful games development. And nothing kills coder iteration like long build times. For this reason some studios out-rightly ban the use of templates or generally frown on them since they can be a severe compile bottleneck.

However the real reason templates can cripple compile times is actually a problem at the core of C++ headers parsing. Template code traditionally lives entirely in the header file, even though it doesn't need to! In fact templates are one of the few places where the use of inline (.inl) files actually makes sense to me. By separating the template implementation into an inline file, you can keep the dependent header-files light, while only source (.cpp) files where the template implementations are explicitly used need to include the .inl file. I'd say as a general best practice, you should never include inline files in a header file anyway; else what's the point of the modularity/code-separation?

Code Example

foo_template.h
template<typename T> class foo_template
{
void DoSomething();
};

foo_template.inl
template<typename T>
inline foo_template<T>::DoSomething() { printf("doing something y'all"); }

foo.h
#include foo_template.h
class Foo
{
foo_template<T> mFoo;
public:
void DoSomething();
};

foo.cpp
#include foo_template.inl
void Foo::DoSomething() { mFoo.DoSomething(); }

A Lightweight Reflection System: Part I

In my previous post, I mentioned one of the benefits of using C++ as a scripting language being that you can entirely avoid the need for a reflection system. Nonetheless, reflection can greatly enhance the flexibility of any code workflow, also yielding a more powerful scripting system. In this post, I'll be detailing the first part of a lightweight reflection layer, largely automated and with ease-of-use in mind. For anyone looking for a pre-existing reflection library, I think Ponder looks really good. I however seem to have a thing for wanting to write tiny systems that keep end-user dependencies to a minimum. The end goal is to be able to achieve code like with a handful of source file dependencies:

nano::type id = Refl(myobj).TypeID();
const char* name = Refl(myobj).Typename();
printf( "%s : %d", name, id );
Refl(myobj).Call( "SetPosition", vec3(10, 0, 10) );

...you catch the drift.

DataProperty

First we'll start with a type-erasure class to facilitate passing data and arguments between script and host application - or even across the application's sub-systems - with embedded type safety mechanisms. As you might have guessed, this is what constitutes the signature of the Refl::Call() method previewed above.

DataProperty Refl::Call( const char* method, DataProperty& arg1 )
DataProperty Refl::Call( const char* method, DataProperty& arg1, DataProperty& arg2 ) ...etc

But more on th3 Refl class in the next post.

Type Erasure

At the barest level, achieving type erasure is pretty simple:

class DataProperty
{
public:
template<typename T> DataProperty(T& data) : m_data(&data)
{}
template<typename T> operator T&() { return *(T*)m_data; }
private:
void* m_data;
};

Usage
float val1 = 4;
DataProperty data = val1;
float val2 = data;

In practice, the above class would work for all Lvalue objects. But it should really be able to handle non-persistent data to avoid stray pointers. Achieving this would involve making a copy of the data, meaning implicitly knowing the data size. The class members now become:

uintptr_t m_data;
unsigned m_size;

The switch from void* to uintptr_t is so we can intuitively store numeric values in m_data; (a union can be used as well). The initialisation paths of m_data can be split into three groups:

Numeric/Small Types Or Pointers: m_data = p_data; // store value directly. any type smaller than sizeof(uintptr_t) falls into this category.
Reference Types: m_data = &p_data // store address of object.
Big Value Types: m_data = new T(p_data) // store pointer to 'copy constructed' value.

The usage of new in the last case is a bit misleading as in practice we will only support trivially copyable types for simplicity; so we can just malloc and free custom data without having to worry about properly calling constructors or destructors. It's up to the reader to extend the class to support non-trivial types if they choose to - I've found this limitation to be perfectly acceptable for all purposes I can think of.

We fall to SFINAE techniques and type_traits to adequately assign m_data based on which group the assigned type falls into. There's however one more essential flaw that needs to be addressed; type safety. This is where things become significantly more involved - safety ain't cheap.

Type Identification, without RTTI

Recall that RTTI support is stripped from the scripting engine in my previous post to achieve portability and significantly smaller executables. Moreover it's generally stripped from performance critical applications like games to discourage the use of dynamic_cast; among other reasons. Therefore a compile-time solution is the best way to go. Most compile-time/non-RTTI typeid techniques are based on either explicit user-side declaration or automated generation methods reliant on the uniqueness (or addresses) of static variables and functions. A simple automated method just needs a template like:

template <typename T>
uintptr_t Type_ID()
{
static char sTypeID;
return &sTypeID;
}

This works really well - with ample room for extension into a partially explicit system via template specialisation (to support polymorphism for instance). However it falls short when it comes to usage across dlls and a host application. This is due to its reliance on the address of static variables or functions, as each dynamic library would hold a unique copy of the statics, meaning different addresses and ultimately different type_ids for the same type depending on where Type_ID<T>() is called from.

After some experimentation, including a foolishly futile attempt to defer the call to Type_ID<T>() by wrapping it in abstract objects; I realised I'd been overlooking a preprocessor indispensable to debugging & profiling systems: __FUNCTION__ (or __PRETTY_FUNCTION__ in GCC).

Calling __FUNCTION__ within the int version of Type_ID<T> for instance would evaluate to "Type_ID<int>". Using this knowledge, we can auto-generate ids that are consistent across whatever execution space we're running; and it also opens up possibilities for getting the actual typename with some minor string parsing - which is skipped here for conciseness.

template <typename T>
unsigned Type_ID()
{
static unsigned sTypeID(0);
if(sTypeID == 0)
sTypeID = some_char_string_hash32_function(__FUNCTION__); // as usual, FNV1 or CRC32 would suffice here.
return sTypeID;
}

Bringing it all together

With all this in place, we can now finalise our DataProperty class. First the SFINAE m_data assignment will be outsourced to external template functions as C++ understandably doesn't support partial specialisation of template class methods.

namespace utils
{
template <typename T = void> using if_pointer_t = typename std::enable_if< std::is_pointer<T>::value, T>::type;
template <typename T = void> using if_value_t = typename std::enable_if< !std::is_pointer<T>::value, T>::type;

template<typename T> void construct(uintptr_t& dest, unsigned& dest_size, T& p_data);

template<typename T> void construct(uintptr_t& dest, unsigned& dest_size, if_pointer_t<T>& p_data)
{
dest = (uintptr_t)p_data;
dest_size = 0; // make pointers zero-sized for identification/type-checking
}

template<typename T> void construct(uintptr_t& dest, unsigned& dest_size, if_value_t<T>& p_data)
{
const unsigned size = sizeof(p_data);
if (size > sizeof(uintptr_t))
{
static_assert( std::is_trivially_copyable<T>::value, "unsupported for non-trival types" );
dest = (uintptr_t)malloc(size);
memcpy_s((void*)dest, size, &p_data, size);
}
else
{
(T&)dest = p_data;
}
dest_size = size;
}
}

class DataProperty
{
public:
template<typename T> DataProperty(T& data)
{
utils::construct<T>(m_data, m_size, p_data);
m_typeid = Type_ID<T>();
}

~DataProperty()
{
if (m_size > sizeof(uintptr_t))
free((void*)m_data);
}

template<typename T> operator T&()
{
if(Type_ID<T> != m_typeid)
throw "incompatible types";
return *(T*)m_data;
}
private:
uintptr_t m_data;
uintptr_t m_typeid;
unsigned m_size;
};

In the next post of this series, I'll be discussing the rest of the reflection system. This has skipped a few important details about DataProperty; for instance, we currently can't read a pointer-initialised m_data as a value - it must be read as a pointer. Typically because Type_ID<T>() has a different signature to Type_ID<T*>(). It'll also be nice to safely make copies of DataProperty and support reference type initialisation. This will all be covered in the final source code that'll be made available in the next of the series. Peace.

Wednesday, 17 August 2016

Hot Reloadable C++ Modules

The concept of C++ 'scripting' is nothing new, in fact some of you may be familiar with Molecular Musings and RuntimeCompiledCPlusPlus (amongst others) who have successfully implemented this to varying degrees.

My job here however to give an end-to-end walk-through of implementing a basic version, with sample source inclusive.

Rather than place code snippets in this article, I'll mostly explain some of the gotchas and overall architecture with some insights into design decisions. If you're like me & you learn better by muddling through barely commented code, feel free to skip this and head straight to bitbucket as the code flow is pretty self-explanatory.

The crux of the idea is simple: Most operating systems expose C++ runtime APIs for loading & unloading dynamic libraries as well as initiating command line processes. Nothing thus prevents us from firing off a compile process, loading the resulting library, executing it, editing the source, recompiling, reloading it, execute, repeat - all at runtime. Obviously there are caveats involved, and I'll strongly suggest reading Molecular Musings for a more complete perspective - especially if you're also wondering: why bother - why not just use an actual scripting language.

Personally my motivation for pursuing this was mix of curiosity & laziness; scripting languages often require significant setting up for bindings and add considerable bloat; C++ modules the other hand have the advantage of being native & incredibly lightweight especially for small projects. Out-the-box parsing, intrinsic debugging, no direct need for reflection, just 3 extra source files. Lets go get it.

We'll be working with Windows in this case, but the concept's directly applicable to other modern operating systems.

Architecture

Our system is comprised of two major classes:

ScriptInterface: The base class for all scripts, fully abstract class and lives in the Engine.

ScriptLoader: For compiling, loading and extrating the ScriptInterface of a given module file. Can live anywhere but makes sense in the Engine.

Bare Simplicities

A 3rd useful class, but largely ignored for the purpose of simplicity is:

ScriptEnvironment: Exposes relevant systems to script e.g allocators, schedulers, factories, renderer(s). Implementation details will be left to the reader/user as you'll find that the power of your scripts rely almost directly on the systems exposed via this class.

For additional simplicity, we'll be working with stateless scripts, avoiding the need for any sort of serialisation during reloads. Rather we'll adopt a WPF style DataContext - which for our intents and purposes is just a void pointer. It's fine; the script and its client know exactly what's in that void, you can cast it; I believe in you. This DataContext can serve as a light interchange layer and/or state storage if necessary. We also ignore error handling to some extent.

ScriptInterface

This simply contains three major abstract functions: OnInitialise, OnDestroy, OnUpdate. You can add more based on your project's needs. Perhaps even some support for reflection - for easier extensibility? Not sure there's anymore to be said on this. Next.

Fast and tiny

If you're into embedded programming, you'll probably appreciate our minimalist approach to modules. A simple hello world program compiled to dll is easily 80kb. However with some work, we can get it down to 3kb.

By excluding the C runtime libraries, we can enjoy fast compile times and tiny dlls (why does that sound like a bad thing). We however lose new & delete operators, and the ability to use static variables, among other things - which is somewhat desirable as it isn't too different from regular scripting languages with their managed memory models.

Now ScriptInterface has no member variables. This is purely a matter of choice; you could decide it's easier to enforce proper initialisation of core member variables if they lived in a base class. I opted to shift this responsibility to individual scripts since they're more iterative and it'll involve less maintenance in the long run. I instead employ an utility ScriptHeader.h file for all our useful macros to keep things looking tidy. The idea is that every script should #include it.

ScriptHeader.h mainly cuts down a lot of the verbose setup involved. A typical example is the allocation of the script class. Since we have no new or delete, ScriptHeader provides SCRIPT_EXPORT() which declares a local buffer the size of the class, defines a CreateScript() function for exporting the dll object in which it uses placement new to construct the script and return the instance. It also condenses the method signatures and member variables intstandard macros, and supplements few other excluded C runtime functions.

ScriptLoader

Now for the most verbose class; ScriptLoader comprises two public facing methods, Load() and Unload(); and one private compileScript()which is called from Load(). Nothing fancy. Until the calls to Windows APIs start piling.

compileScript is responsible for spawning cl.exe with the right parameters and reporting any compile errors to the VS output window. You'd notice that it takes a version number as a second argument - this is there to workaround the fact that unloading a dll in VisualStudio doesn't actually unload its associate .pdb (debug database) file. (Apparently this appears to have been fixed in VS2015). Attempting to compile the dll a second time will fail as the .pdb is still write-locked by VS. Which leaves two immediate alternatives - compile with the '/Zl' flag and avoid generating the pdb - which sucks because we certainly want the ability to debug script code. The other option which I use in the sample is versioning - where each recompile spits out a new copy of the dll rather than attempt to overwrite the previous one. This can get quite messy quickly but I like to think of it as a Content Management problem for another day/post. The interim solution is to spit them all out into a '.\\modules' folder which can be emptied without qualm. What I actually do in a production app is default to using '/Zl' to spawn non-debuggable dlls, and use a separate trigger mechanism to create versioned dlls when debugging is needed. I briefly explored a third option of manually enabling/disabling specific pdbs via VS but that's not worth going into.

We also want to be able see the output of our spawned process. Thankfully Windows provides pipes for that. And even better, using OutputDebugString to print any error strings allows you to click and navigate to the error straight from the output window, like a regular compile error!

Some of the compile options are pretty standard, the most noteworthy is '\GR-' which disables RTTI and dramatically cuts executable size.

Once compileScript has succeeded, we then go ahead and load the dll via LoadLibrary. Great liberty is taken in all this due to fore-knowledge of our directory structure and extension names - allowing script-loads to be requested based on filename alone, while the extensions and full path are added within the loader. You'd have to pardon my flippant use of hardcoded paths in that part of the code, it's just how I roll in these mean streets etc.

We additionally maintain a cache of loaded modules so they can be properly cleaned up before re-load is issued. Besides this, the overall lifetime management is left to the client.

Sample Code

A simple Win32 console application in which the script "test.cpp" is used to printf to the console. It can be edited at runtime and reloaded by pressing 'r' with the app in focus.

https://bitbucket.org/nanofracture/hotreloadablecpp

Best Practices

For ScriptEnvironment, use Service Locators or Dependency Injection rather than Singletons or Statics. They're the better poisons in this case as Statics and Singletons generate different instances across the main executable and loaded dynamic libraires.
ScriptEnvironment can live anywhere. Note that the Engine knows absolutely nothing about it, it's just forward declared. Only the script & obviously whatever initialises it needs to know about it - making it super flexible as maintaining it can easily get messy. In most other cases, you can take advantage the DataContext.
Try to avoid pointers in scripts, be more scripty (I know this contradicts my DataContext usage, I am ashamed). This is mainly to avoid dangling references or script-side memory walks.

In a future post, I might talk about how to add Reflection for easier extensibility. Or I may not - let's just vibe with it. Thanks for reading. Questions & comments are super welcome, but please be gentle... it's my first post.

References

So much MSDN

https://blog.molecular-matters.com/2014/05/10/using-runtime-compiled-c-code-as-a-scripting-language-under-the-hood/

https://github.com/RuntimeCompiledCPlusPlus/RuntimeCompiledCPlusPlus

https://github.com/i-saint/DynamicPatcher

http://www.catch22.net/tuts/reducing-executable-size