Skip to content

Typed vs Untyped Virtual Machines

June 8, 2022

One of the things that’s been on my mind recently is the idea of building a virtual machine for code archival purposes. Something that’s optimized for long-term stability and longevity, with the goal of helping prevent code rot. This is a fundamentally hard problem to solve, because the world changes, and so software changes with it, but at the same time, there’s no fundamental reason why software written 5, 10, 20 or even 50 years ago couldn’t run anymore.

To some extent, you can still run 50 year old software if you have the right emulator, but I think that this is going to become harder and harder to do, because modern software stacks are becoming increasingly complex and typically have a lot of external dependencies. Then there’s the added problem that you also have to worry about the emulator itself suffering from code rot. A friend of mine who’s learning about programming was asking me the other day why it is that iPhone apps need to be updated so often when the functionality of the apps isn’t changing. It’s because the foundation that these apps are sitting on keeps changing, and if said apps aren’t routinely updated, they’ll quickly stop working. Often, in the software world, dependencies are liabilities.

Many kinds of softwares could be designed to work just fine with very limited kinds of inputs and outputs. If you think about something like a spreadsheet program, a text editor, or many kinds of 2D and 3D games, most of these programs fundamentally only need access to a keyboard, a mouse, the filesystem and a display. All of these things have been available for over 30 years, and although the fundamentals of what they do haven’t really changed, the APIs to access them are changing constantly. Many software programs could be much better protected from code rot if they were written to run on a VM with stable APIs/ABIs and fully statically linked.

In my opinion, to protect software from code rot, we need an approach to software design that’s more intentional, minimalistic and disciplined. That approach to design should start with the VM itself. The VM should provide a stable set of API. It’s easier to keep the set of APIs stable if the VM itself is minimalistic and small. Keeping the VM small makes it easier to port to new architectures, which makes keeping software running easier. A small VM is also easier to implement correctly and consistently across platforms.

There are many different ways to design a VM. The design I have in mind is something like a single-threaded bytecode VM. The VM could be register-based or stack-based. I have a certain fondness for stack-based VMs because they’re very simple, easy to target, and stack-based bytecode has the neat side-effect that it provides implicit liveness information (values popped off the stack are dead). At the end of the day, whether a VM is register-based or stack-based isn’t very important because it’s entirely possible to convert between the two.

Another important design axis which is not as trivial and can have much broader implications is whether the VM is typed or untyped. The WASM and Java VMs are typed in the sense that they have a built-in notion of the type of values, functions, modules and objects. In contrast, something like a Commodore 64 emulator would emulate the instruction set of its 6510 CPU, but does so without assigning types to values in registers or in memory, treating values as bits and bytes without really tracking what they represent.

There are advantages to building a type system into a VM. There’s the potential for optimizations, potentially increased safety, and also the ability to more easily decompile and repair broken programs. However, on the flip side, a VM design that incorporates a type system seems inherently more complex and more constrained. Simply put, if you want to enforce rules around typing in your VM, then you have to build typing rules into your design, and that forces you to try to precisely categorize the kinds of computations your VM can perform with a lot more detail. This in turn forces the typed VM design to take on a lot more responsibilities.

In order to track the types of object fields an array elements, the typed VM design has to have a notion of what is an object (or struct), of what is an array, etc. It has to have a pre-established notion of what is a function so that it can assign types to function calls. It also has to have a notion of every kind of control flow structure that is supported so that it can assign types to them. In contrast, an untyped VM design can easily represent any control flow mechanism, be it function calls, exceptions, continuations, and coroutines using simple jump instructions. The untyped VM also doesn’t care about the way objects and arrays are implemented, because it treats memory as a linear array of bytes.

Beyond having the ability to assign types to everything, it seems to me that a typed VM design must inherently take on more responsibilities than an untyped VM design. In order to maintain typing constraints while offering decent performance, the typed VM is forced to take the responsibility of implementing a sophisticated JIT compiler that will understand and enforce all these constraints. This is because the running program can’t be trusted to implement its own JIT compiler as this can’t be proven safe. Furthermore, if the typed VM wants to support garbage collection, it also has to take on the responsibility of implementing its own GC, because once again, the running program can’t be trusted to manage its own memory while respecting typing constraints.

An untyped VM design can be much more minimalistic. It has to enforce a small set of hard constraints, such as making sure that pointer dereferences respect valid address bounds so that the running program can’t crash the host VM, but it doesn’t really need to care about the types of values or enforcing typing constraints. For performance, it can implement a very barebones JIT compiler based on dynamic binary translation, but it doesn’t have to care about optimizing type checks, and it can even allow the running program to manage its own memory and implement its own garbage collector.

In summary, I think there’s a strong case to be made that an untyped VM design can easily be much smaller and more minimalistic than a typed VM design. A simpler untyped VM has two major strengths. The first is that it doesn’t place as many restrictions on running programs. Programs can implement their own control flow structures, their own GC, or even their own JIT. The second is that a smaller, simpler VM is much easier to port, reimplement and maintain. If you think about the amount of effort that would be required to build a new Java VM from scratch and make it perform well, you quickly realize that such an undertaking is only possible for a massive corporation. There is still no official RISC-V support by the JVM, and it’s easy to understand why.

  1. So excited to see you diving deeper into small VMs.

    Types at the vm level seems like it would make things that should be simple to do more painful, you’ll want to always be able to reuse one piece of data for any purpose that might come up, without having to do cast it into another, I can see many cases, especially for stack machine, where this will make things inefficient.

    One thing that you could do, if you do decide to go for a stack machine, would be to make stacks as a primitive, so you have a stack of stacks, like the Joy Programming language, this would allow you to build types more easily, but it wouldn’t enforce it at the assembly layer.

    For anyone reading this and curious to learn more about VMs for archival, I’d suggest checking out:

    Click to access tr2015004_cuneiform.pdf

  2. Matthew Fernandez permalink

    Great article!

    This is something I’ve pondered myself. But what I always get stuck on is, why not using an existing architecture? For archival purposes, why not e.g. pick something like MIPS? Sure it has some characteristics that are irrelevant for a VM and it doesn’t fit modern machines well. But it’s trivial to emulate, and you get a bunch of toolchain support that already works. And you get execution today on almost any platform via qemu.

    As you observe, retaining types at the machine level only gives you a benefit at the expense of pushing complexity into the platform (current or future, physical or virtual). Type erasure, for better or worse, makes the platform implementer’s job easier.

    Having said that, I think the dotnet VM (or CLR or whatever it’s called…) has some very interesting ideas around leveraging retained type information to do optimizations.

    Looking forward to your next post :)

    • The obvious reason to use an existing instruction set would be to benefit from existing tooling.

      There’s many reasons not to though. Real CPUs are pretty complex and have very large instruction sets. That makes it more likely that there will be implementation bugs or differences between implementations. There’s more room for “undefined behavior”.

      Then you have the problem that maybe you can’t support some instructions, and so you can’t quite use the existing tooling, you have to maintain your own fork. That can quickly become a massive maintenance burden, or you can end up several versions behind. You might have to provide elaborate instructions on how to build this modified toolchain and which options to link and compile with.

      Then there’s the issue of what you do about system calls or to interface with the platform. I’d prefer some minimalistic APIs. Using an existing instruction set would tend to push you in the direction of using system calls and mirroring the intricacies of Linux system calls, for example. Not that you have to do it that way, but it might be tempting, and I don’t think it would be a good idea.

      Then there’s other questions like how would you handle things like parallelism. If you use a “real” instruction set, I think it will push you in the direction of implementing traditional threading model, mutexes, etc. You’ll probably want to use whatever the instruction set provides to implement these primitives. I think it can quickly get hairy and again lead to differences in implementations.

      • Matthew Fernandez permalink

        The syscall part is mostly OS convention, right? Just don’t use the `syscall` instruction or whatever the equivalent is. Having said that, this obviously gives you no way of doing I/O. You could imagine narrowly emulating just `write` to fd 1 for the purposes of output, but obviously this is pretty yuck. Not sure where I’m going with this, just wondering if the cure is worse than the disease.

        Is parallelism relevant for archival purposes? This is sort of a huge question, but the version of software parallelism we have today seems pretty clearly an artifact of the hardware implementations of parallelism we’ve experienced thus far. If we want a current “program” to map onto hardware 1000 years from now, it will likely have a completely different (more refined?) concept of parallelism. If you want to exploit future parallelism, it seems to me your program has to be written at a much higher level than any current VM exposes. OTOH maybe this is exactly your point.

        • I have a few ideas some forms of parallelism that wouldn’t cover every use case, but would be a lot more portable and predictable than threads. For example, providing some parallelized primitives for doing things like matrix multiplications so that the program can just ask for matrices to be multiplied without having to care about the number of CPUs on the host machine and such.

          Is parallelism relevant, well, one of the goals is archival, but I would like this system to be usable for many classes of software people currently use as well :)

  3. bytefu permalink

    Although obviously biased as a long-term fan of languages that are compiled to native code, I think static binary translation is better not only by delivering predictable performance, but also by reducing the overall overhead of translation, since it only happens once per application. Android’s AOT is a prime example.

  4. kiedtl permalink

    Somewhat off-topic, but what algorithms for converting register based code to stack based code do you know of?

    • I think it’s considerably easier to go stack => register, but it is possible to go the other way around. AFAIK JikesRVM (a Java VM) does the conversion of stack bytecode => SSA, and then SSA => machine code. To do the conversion, you’d need to compute the live intervals, which for the values you push and pop is going to be pretty trivial, and for local variables you can use linear scan. You might also want some peephole optimization passes to patch things up.

  5. Kyr permalink

    Can you comment about the design of the toit vm?
    What do you think of it?

Leave a Reply to Kyr Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: