Wednesday 17 August 2016

Hot Reloadable C++ Modules


The concept of C++ 'scripting' is nothing new, in fact some of you may be familiar with Molecular Musings and RuntimeCompiledCPlusPlus (amongst others) who have successfully implemented this to varying degrees.
My job here however to give an end-to-end walk-through of implementing a basic version, with sample source inclusive.

Rather than place code snippets in this article, I'll mostly explain some of the gotchas and overall architecture with some insights into design decisions. If you're like me & you learn better by muddling through barely commented code, feel free to skip this and head straight to bitbucket as the code flow is pretty self-explanatory.

The crux of the idea is simple: Most operating systems expose C++ runtime APIs for loading & unloading dynamic libraries as well as initiating command line processes. Nothing thus prevents us from firing off a compile process, loading the resulting library, executing it, editing the source, recompiling, reloading it, execute, repeat - all at runtime. Obviously there are caveats involved, and I'll strongly suggest reading Molecular Musings for a more complete perspective - especially if you're also wondering: why bother - why not just use an actual scripting language.

Personally my motivation for pursuing this was mix of curiosity & laziness; scripting languages often require significant setting up for bindings and add considerable bloat; C++ modules the other hand have the advantage of being native & incredibly lightweight especially for small projects. Out-the-box parsing, intrinsic debugging, no direct need for reflection, just 3 extra source files. Lets go get it.

We'll be working with Windows in this case, but the concept's directly applicable to other modern operating systems.

Architecture

Our system is comprised of two major classes:
ScriptInterface: The base class for all scripts, fully abstract class and lives in the Engine.
ScriptLoader: For compiling, loading and extrating the ScriptInterface of a given module file. Can live anywhere but makes sense in the Engine.

Bare Simplicities

A 3rd useful class, but largely ignored for the purpose of simplicity is:
ScriptEnvironment: Exposes relevant systems to script e.g allocators, schedulers, factories, renderer(s). Implementation details will be left to the reader/user as you'll find that the power of your scripts rely almost directly on the systems exposed via this class.

For additional simplicity, we'll be working with stateless scripts, avoiding the need for any sort of serialisation during reloads. Rather we'll adopt a WPF style DataContext - which for our intents and purposes is just a void pointer. It's fine; the script and its client know exactly what's in that void, you can cast it; I believe in you. This DataContext can serve as a light interchange layer and/or state storage if necessary. We also ignore error handling to some extent.

ScriptInterface

This simply contains three major abstract functions: OnInitialise, OnDestroy, OnUpdate. You can add more based on your project's needs. Perhaps even some support for reflection - for easier extensibility? Not sure there's anymore to be said on this. Next.

Fast and tiny

If you're into embedded programming, you'll probably appreciate our minimalist approach to modules. A simple hello world program compiled to dll is easily 80kb. However with some work, we can get it down to 3kb.

By excluding the C runtime libraries, we can enjoy fast compile times and tiny dlls (why does that sound like a bad thing). We however lose new & delete operators, and the ability to use static variables, among other things - which is somewhat desirable as it isn't too different from regular scripting languages with their managed memory models.

Now ScriptInterface has no member variables. This is purely a matter of choice; you could decide it's easier to enforce proper initialisation of core member variables if they lived in a base class. I opted to shift this responsibility to  individual scripts since they're more iterative and it'll involve less maintenance in the long run. I instead employ an utility ScriptHeader.h file for all our useful macros to keep things looking tidy. The idea is that every script should #include it.

ScriptHeader.h mainly cuts down a lot of the verbose setup involved. A typical example is the allocation of the script class. Since we have no new or delete, ScriptHeader provides SCRIPT_EXPORT() which declares a local buffer the size of the class, defines a CreateScript() function for exporting the dll object in which it uses placement new to construct the script and return the instance. It also condenses the method signatures and member variables intstandard macros, and supplements few other excluded C runtime functions.

ScriptLoader

Now for the most verbose class; ScriptLoader comprises two public facing methods, Load() and Unload(); and one private compileScript()which is called from Load(). Nothing fancy. Until the calls to Windows APIs start piling.

compileScript is responsible for spawning cl.exe with the right parameters and reporting any compile errors to the VS output window. You'd notice that it takes a version number as a second argument - this is there to workaround the fact that unloading a dll in VisualStudio doesn't actually unload its associate .pdb (debug database) file. (Apparently this appears to have been fixed in VS2015). Attempting to compile the dll a second time will fail as the .pdb is still write-locked by VS. Which leaves two immediate alternatives - compile with the '/Zl' flag and avoid generating the pdb - which sucks because we certainly want the ability to debug script code. The other option which I use in the sample is versioning - where each recompile spits out a new copy of the dll rather than attempt to overwrite the previous one. This can get quite messy quickly but I like to think of it as a Content Management problem for another day/post. The interim solution is to spit them all out into a '.\\modules' folder which can be emptied without qualm. What I actually do in a production app is default to using '/Zl' to spawn non-debuggable dlls, and use a separate trigger mechanism to create versioned dlls when debugging is needed. I briefly explored a third option of manually enabling/disabling specific pdbs via VS but that's not worth going into.

We also want to be able see the output of our spawned process. Thankfully Windows provides pipes for that. And even better, using OutputDebugString to print any error strings allows you to click and navigate to the error straight from the output window, like a regular compile error!

Some of the compile options are pretty standard, the most noteworthy is '\GR-' which disables RTTI and dramatically cuts executable size.

Once compileScript has succeeded, we then go ahead and load the dll via LoadLibrary. Great liberty is taken in all this due to fore-knowledge of our directory structure and extension names - allowing script-loads to be requested based on filename alone, while the extensions and full path are added within the loader. You'd have to pardon my flippant use of hardcoded paths in that part of the code, it's just how I roll in these mean streets etc.

We additionally maintain a cache of loaded modules so they can be properly cleaned up before re-load is issued. Besides this, the overall lifetime management is left to the client.

Sample Code

A simple Win32 console application in which the script "test.cpp" is used to printf to the console. It can be edited at runtime and reloaded by pressing 'r' with the app in focus.


Best Practices

  • For ScriptEnvironment, use Service Locators or Dependency Injection rather than Singletons or Statics. They're the better poisons in this case as Statics and Singletons generate different instances across the main executable and loaded dynamic libraires.
  • ScriptEnvironment can live anywhere. Note that the Engine knows absolutely nothing about it, it's just forward declared. Only the script & obviously whatever initialises it needs to know about it - making it super flexible as maintaining it can easily get messy. In most other cases, you can take advantage the DataContext.
  • Try to avoid pointers in scripts, be more scripty (I know this contradicts my DataContext usage, I am ashamed). This is mainly to avoid dangling references or script-side memory walks.

In a future post, I might talk about how to add Reflection for easier extensibility. Or I may not - let's just vibe with it. Thanks for reading. Questions & comments are super welcome, but please be gentle... it's my first post.


References

So much MSDN

No comments:

Post a Comment