Building a Copying GC for the Plush Programming Language

This is the fourth post in a series about Plush, a dynamically-typed programming language I've been building in my spare time. The language draws inspiration from Lox/JavaScript, and I created it as a platform to experiment and have fun with sound and graphics programming. It's very easy to just open a window and draw some pixels with very little boilerplate.

One of the key features of this language is that it has actor-based parallelism, which makes it relatively easy, even for beginners, to write multithreaded programs without having to think about locks and memory safety. I took some care to implement the Virtual Machine (VM) in such a way that internally, there is no global VM lock, meaning that independent actors only need to synchronize with each other when they are sending and receiving messages, but otherwise, they can execute fully independently.

One of the trickier aspects of the Plush VM design is that this is a garbage collected language, and actors can send messages (meaning heap-allocated objects) to one another. So, from the beginning, there was a design question of how I could make it so that these messages, which can be entire subgraphs of objects with cycles in them, could be sent from one actor to another, with dead objects being garbage-collected, and at the same time keep all of this efficient.

In an actor-based system, you don't want two actors holding a mutable reference to the same object. That means if you send an object from one actor to another, you either need to copy it, or you need to freeze the object (make it immutable). I wanted to avoid needing to freeze objects before sending messages because I felt it added complexity to the system and would mean the programmer would need to keep track of what is and isn't frozen. If you want to send a subgraph of objects to another actor, but you want to keep a local copy, then you first need to deepcopy these objects, and then deepfreeze your subgraph. It's conceptually much simpler if you can just send the root of the subgraph as a message, and the host VM copies everything for you behind the scenes. Furthermore, there's an issue with requiring messages to be immutable. The assumption there is that immutable objects can be safely shared between actors, but if you do that, then it means you need a VM-wide garbage collection to periodically synchronize with all your actors. Supporting this kind of immutable object sharing isn't inherently a bad idea, but it does inevitably add complexity to the system.

In Plush, when you send a message, the VM silently copies the subgraph of objects referenced in the message. That does open up more questions with regards to memory management though. For one thing, if I want to send a message from actor A to B, then I need to copy the message into a memory space (an allocator) that is owned by B. However, I'd like to avoid a situation where B needs to take a lock on its own memory allocator every time it wants to allocate anything, because that would greatly slow down allocations. The solution I came up with is to give each actor two allocators. An actor has an internal private allocator that it can allocate objects from without ever needing to take a lock, and a mailbox allocator which is used to send messages to it.

The double allocator design has the key benefit that each actor can do its own private allocations without locking, which maximizes allocation performance. I decided to go with a bump allocator and a copying GC design. This keeps things simple and means that allocating memory is very fast. There was just one problem: I kept putting off implementing the garbage collector because I was dreading it. Past experience has taught me that implementing a GC can be very error-prone. There are often lots of sneaky bugs that show up, and you can end up needing to refactor key parts of your VM.

You can write many cool programs even without a GC. There are lots of programs that never allocate much memory during their execution. However, not having a GC does eventually become limiting. I tried to write code to train a neural network to do beat detection in Plush, and that program allocated a lot of matrices. I could only run it for a few minutes or so before it would crash, even on my desktop with 128GB of RAM. I was running into the limits of Alloc-Until-You-Crash Technology (TM). Still, I kept procrastinating on the GC, until my friend Marco jumped in two weeks ago and offered to help. Collaborating with someone on the feature made it much easier to find the motivation to tackle this challenge.

The good news is, it took Marco and I just over two weeks and we got the GC about 90% complete. There were some segmentation faults to debug, but it seems like we fixed most of the issues fairly quickly. Based on past experience, I went ahead and wrote a number of adversarial test programs to try and stress test the GC and surface bugs. That strategy proved effective. Turns out it's sometimes not that stressful to do hard things if you break them up into many small steps.

For fun, with the help of LLMs, I wrote a program that is a tribute to the classic Amiga Boing Ball demo. The 3D animation runs seamlessly at 60 frames per second on my MacBook Air M1. It performs GC about once per second, and the pause isn't noticeable. Even though it's only flat-shaded polygons, I feel pretty proud that the Plush interpreter is fast enough to do 3D graphics in real-time.

There are still a few missing bits in the garbage collector. Namely, if you send many large objects to an actor and its message queue gets backlogged, you could fill the message allocator and cause the program to panic. There needs to be logic in place so that actors wait and retry if the receiver's allocator gets full. There also needs to be logic for the message allocator to grow in size when necessary. However, all the essentials are in place, and it's now possible to run long-running programs on Plush.

If you would like to contribute, one issue I've noticed is that collections can be fairly slow once you get past 300K objects, which I don't think should be the case. There is a test program that grows a linked list which demonstrates the problematic slowdown. I suspect that this might have to do with Rust hashmap slowness, or that there may be some quadratic behavior somewhere. We could use some help profiling and investigating that issue. I'm also looking for contributors to write fun demo programs using Plush. We now have the ability to do both graphics and audio output, so it would be possible to write a simple music tracker, for example, or some kind of procedural infinite dungeon of some kind. Infinite Wolfenstein, anyone?