Tachyon has a Garbage Collector!

I just completed a Garbage Collector (GC) for the Tachyon JavaScript JIT compiler. This is one of the big 3 features Tachyon had been missing. A GC isn't entirely crucial for my thesis research, but Tachyon needs one to pass the stage of "toy compiler" and become usable in the Real World (TM). It's just one of those features a dynamic language compiler needs to be taken seriously. The GC is about 2000 lines of code in total and took me about two weeks to write and debug. It's a pretty simple stop-and-copy semi-space collector based on Cheney's algorithm. My main goal at this point was to get something working. I chose to keep the design nice and simple, to facilitate debugging.

What makes this GC a little more interesting is that the GC itself, as the rest of Tachyon, is written in my extended JavaScript dialect. There isn't a single line of C code in there (not one!). The GC is entirely compiled to native x86 machine code by Tachyon. I made this choice because I felt it fit better with Tachyon's design philosophy of writing as much of the compiler as possible in JavaScript. I already had many primitives in place to access the layouts of objects in memory. It actually seemed like writing the GC in C would actually make the system more complex, as I would have had to design and implement interfaces for the C code to access all the Tachyon objects. This C code would then become out-of-sync if the memory layout of the said objects changed.

I implemented a few unit tests to verify the GC's proper functioning under realistic use cases. These have predictably revealed many GC bugs. The main difficulty has been the realization that a moving GC must absolutely update every single reference in the system. If a single stack spill somewhere in the system is not updated and points to an old value after a collection, that's a dangling pointer which might well crash the system. This is complicated further by the fact that Tachyon has multiple function call mechanisms (regular calls, calls with apply, calls to functions using the arguments object). Each of these call mechanisms must encode information about the current stack frame at the moment of the call which the GC can access in order to traverse the stack roots.

After much debugging, I'm proud to report that the GC now appears to work properly and its performance seems fully acceptable for my purposes, with GC times from 4ms to 20ms on the tests I ran (and this includes some debug data logging). I plan to implement a few more unit tests so that I can verify the GC's functioning even more thoroughly. I wouldn't want the GC to cause nasty unpredictable errors to pop up down the road, which is a real danger. I feel committed to ensuring that Tachyon is a reliable and well-tested compiler, if just to make my own life easier.

Implementing the GC has forced me to implement a stack traversal infrastructure. This is further good news, because I can reuse this to implement exception handling in Tachyon. I don't know when exactly I'll do that, as I'll be busy in the next few weeks working on things more directly related to my thesis... But, when I get around to implementing exceptions, half the work will already be done.

I would also like to announce that I've decided to make the Tachyon source publically available. I've been hesitant to do this for a while, thinking that I should perhaps wait until Tachyon is "polished enough" for a public release, but I'm honestly not sure my perfectionist leanings will ever be fully satisfied. Hence, I've linked the GitHub URL of the Tachyon repo. If you're interested, feel free to check out the source code and give me some feedback.