ZetaVM, my new compiler project

Like many of you I'm sure, I've wanted to create my own programming language for a long time. I think it's a common aspiration for many programmers, to create a language with all the features we love and none of the ones we hate. Our own "ultimate programming language", created in our own image. What I've come to realize over time, however, is that I'm actually not quite sure what should go into my ultimate programming language. I have a lot of ideas, but they don't necessarily fit into a coherent whole. It's also the case that my ideas about what the "ultimate programming language" should be like keep changing as I gain more programming experience and get exposed to new ideas.

My PhD was in compiler design, and this is something I truly enjoy playing with. As such, I've decided that my next programming language project wouldn't be a programming language per-se, but actually a compiler, a platform to create new programming languages. I'm doing this in part because I enjoy it, and it's something I feel confident I'm good at, but also because I think I can build a platform that will make it much easier for myself and others to do programming language design and experimentation. ZetaVM is going to have, as one of its main design goals, to make creating new programming languages very accessible. It will make it possible for anyone who's mastered a language such as Python to create a language of their own, in less than 2000 lines of code. Not only that, but ZetaVM will instantly make your language decently fast. It will have JIT optimizations suitable to languages such as Python/Lua/JavaScript, and will instantly give you fast objects, type-specialization, etc.

ZetaVM is a virtual machine for dynamic programming languages. It will provide native support dynamic typing and most common data types found in Python/Lua/JS/Ruby, such as strings, extensible objects, extensible arrays. What makes it particularly easy to get your own language running on this VM is that Zeta's Intermediate Representation (IR) is representable as a textual format similar to JSON. This makes it fairly trivial for you to write, say, a Python parser for your new language, and generate Zeta IR in a textual format at the output. You don't have to worry about implementing dynamic typing, or register allocation, or garbage collection, or arrays and objects, all of that is done for you. I've created a simple language called Plush (JS and Lua's bastard child), which demonstrates how this can be done, and serves to help me bootstrap and test the system.

Beyond making it easy for myself and others to create programming languages, Zeta will be a platform for me to try some bold new ideas in the design space of programming languages and JIT compilers. I would like to try and tackle one of the biggest issues plaguing programming languages today, which is that of code rot. My goal is to eventually freeze the IR and APIs provided by Zeta, so that code that runs on Zeta today might have a chance of still working in 20 years, without any changes. This goal is ambitious, but I have some ideas which I believe might make it work.

Finally, one big disclaimer I should give is that Zeta is still a young and immature project. In its current state, Zeta is experimental and will have many breaking changes, as most new languages/platforms do. Zeta also currently only has a naive interpreter which walks the object-based IR and is dog slow, about 200K instructions per second. I'm currently working on an interpreter that will compile the object-based IR into a lower-level internal IR. This interpreter will use Basic Block Versioning (BBV) and self-modifying code. I believe it should realistically able to reach speeds of 100MIPS within the coming months. My plan after that is to build a lightweight JIT which will sit on top of the optimizing interpreter and compile the internal IR to machine code.