Skip to content

Zeta’s JITterpreter

June 11, 2017

About six weeks ago, I made ZetaVM open source and announced it on this blog. This is a compiler/VM project that I had been quietly working on for about 18 months. The project now has 273 stars on GitHub. This is both exciting and scary, because so much of what I want to do with this project is not yet built. I really want to show this project to people, but I also find myself scared that people may come, see how immature/incomplete the project is at this stage, and never come back. In any case the project is open source now, and I think it would be good for me to write about it and explain some of the design ideas behind it, if only to document it for current and potential future collaborators.

One of the main goals of ZetaVM is to be very approachable, and enable more people to create their own programming languages, or to experiment with language design. With that goal in mind, I’ve designed a textual, stack-based bytecode format that resembles JSON, but allows for cycles (objects can reference one-another). Functions, basic blocks, and instructions in this bytecode format are all described as a graph of JS-like objects. This is very user-friendly. Anyone can write a Python script that outputs code in this format and run the output in ZetaVM. It’s also very powerful, in a LISP-y code-is-data kind of way: you can generate and introspect bytecode at run time, it’s just made of plain objects and values that the VM can manipulate. ZetaVM has first-class bytecode.

The downside of all this, as you can imagine, is that it inherently has to be absolutely dog slow. Or does it? The first version of the Zeta interpreter traversed the graph of bytecode objects the naive way, and it was indeed dog slow. I’ve since written a new interpreter which removes the overhead of the object-based bytecode by dynamically translating it to an internal representation (think dynamic binary translation). The internal IR is compact and flat, executing it involves no pointer hopping.

The new interpreter generates code on the fly, which means it is, by definition, a Just-In-Time (JIT) compiler. It’s architecture is based on Basic Block Versioning (BBV), a compilation technique I developed during my PhD (talks and papers on the topic are linked here if you’re curious). BBV has the nice property that it generates code lazily, and the generated code naturally ends up compact and fairly linear. This is not done yet, but BBV also makes it possible to specialize code to eliminate dynamic type checks very effectively, and to perform various optimizations on the fly.

You might be wondering why I’m bothering with an interpreter, instead of just writing a JIT that generates machine code. One of the motivating factors is that Zeta is still at a very early stage, and I think that an interpreter is a better choice for prototyping things. Another factor is that it occurred to me that I could potentially make Zeta more portable by having the interpreter do much of the compilation and optimization work. The interpreter can do type-specialization, inlining and various code simplifications.

The interpreter will be designed in such a way that the internal IR it produces is optimized and in many ways very close to machine code. It should then be possible to add a thin JIT layer on top to generate actual machine code. The resulting JIT will hopefully be much simpler and easier to maintain than if one were compiling directly from the raw object-based bytecode format. Another benefit of this design is that all of the optimizations that the interpreter perform will not be tied in with the specifics of x86 or other architectures, they will remain portable.

At the moment, the new interpreter is at a point where it lazily compiles code into the flat internal format, but performs no other optimization. This was enough to get a 7x performance improvement over the naive interpreter, but the current system is still quite a bit below the performance level of the Python interpreter, and there is definite room for improvement. Some of the first optimizations I would like to introduce are the elimination of redundant branching instructions, and the use of inline caching to speed up function calls.

The elimination of redundant branches is fairly easy to do with BBV. Code is lazily compiled, and appended linearly into a buffer. When generating code for a branch, if the block we are jumping to is just about to be compiled, then the branch is redundant. BBV will naturally tend to generate code that flows linearly along hot code paths and branches out-of-line for infrequent code paths. That is, the default path often comes right next and requires no branching.

Inline caching is a classic technique that was pioneered by the Smalltalk and SELF VMs. It’s used, among other things, to eliminate dynamic property lookups when polymorphic function calls are performed (see this excellent blog post for more information). Currently, ZetaVM, when performing a function call, needs to read multiple properties on function objects. For instance, it needs to find out how many arguments the function has, and what is its entry basic block. These property lookups are dynamic, and relatively slow. The end result is that the call instruction is very slow compared to other instructions.

Most call instructions will end up always calling the same instruction. Hence, dynamic overhead on function calls can largely be eliminated by caching the identity of the function being called by a given call instruction. That is, one can cache the number of arguments and entry basic block associated with a function object the first time a call instruction is run, and then reuse this information when calling the function again, provided we keep calling the same function. This information will be cached in line, in the instruction stream, right after the call instruction opcode, hence the name inline caching. I anticipate that with inline caching, function calls in Zeta can be made several times faster.

From → Compilers, Zeta

12 Comments
  1. just have fun with it
    and forget everything else

  2. tommy permalink

    this is super awesome.
    Exciting to see where this goes, this project has sparked my interest in making my own language.

  3. Very excited to read more about this! I’m working on a dynamic language that has gradual-typing, would that be possible with ZetaVM?

    • Absolutely. As a matter of fact, a friend has gotten a paper accepted into a major conference, which uses the JIT compiler I developed for my PhD to implement gradual typing. He has gotten very good results. Zeta will use the same kind of JIT, and will also have powerful dynamic type check elimination, so it should work quite well for gradual typing :)

  4. Hi Maxime. Fantastic project! I don’t know too much about compilers/VMs/CompSci but was curious what your thoughts are on the Ethereum VM (EVM). The EVM has only one stable high level language (Solidity) and we as app/Dapp developers have experienced issues with this language and I would like a way to experiment with language creation for the EVM. Forgive the ignorance of this question, but would it possible to have ZetaVM as an intermediate VM target for ultimately targeting the EVM as runtime? It would require some kind of ZetaVM=>EVM transpiler/translator or shim right?

    I’d like to see a high level EVM language that is strongly typed (maybe Typescript or Python inspired due to popularity) and amenable to formal verification. I know we should probably just bite the bullet and target the EVM directly, but there seem to be so many cool features in the ZetaVM that I’d like to explore that for HLL exploration and see if some of the features can be implemented on the EVM using a ZetaVM to EVM transpiler or translator.

    BTW I moved from Montreal to Silicon Valley in 1999. It was a bit of a culture shock. Growing up in the boring ‘burbs (Beaconsfield) made it easier, but I agree there’s nothing here that even approximates downtown Montreal at night. :) Thanks for being so available to answer questions, and please ignore any negativity on hackernews or reddit, it’s a wonderful project!

    • Hi Andrew,

      It might be possible to compile from Zeta bytecode to EVM bytecode. I would have to look into the specifics, but since Zeta bytecode is first-class, it might be possible to write a Zeta library that does this. Strongly typed languages are definitely feasible. Would you be interested in working on this sort of thing?

      I moved to Silicon Valley and worked at Apple for a year. Now back in Montreal. The pay was great, the work environment was good too, but I ultimately felt too isolated living in the suburbs, and missed my friends and family here.

      – Max

  5. Hi Max,

    I’d certainly be interested in working on it. It would be a real learning experience for me as I’m barely a front-end web developer. I’m looking forward to giving it a try. I guess I’ll start by trying to understand both VMs. Thanks! -Andy

  6. Cristian Vasile permalink

    Hello Maxime,

    When you have time take a look at unison project, a new programming platform.

    Resources:
    http://unisonweb.org/2017-04-28/new-runtime.html
    https://pchiusano.github.io

    Regards,
    Cristian

  7. Cristian Vasile permalink

    Did you study Erlang VM named BEAM?

    There is an interesting paper “BEAMJIT – A Just-in-Time Compiling Runtime for Erlang” where authors did describe their approach based on reusing parts from VM code on building the JIT.

  8. Was interested in the Lisp comment :) Combinations of the right things can give very good results.

    https://github.com/vygr/ChrysaLisp

    2 seconds full build from source.

    Be interested in any comments if you happen to take a look.

    Good luck with the new project.

    Chris

Leave a comment