Higgs: the First JIT Compiler of its Kind

October 31th, 2013

I submitted a conference paper to Compiler Construction 2014 (part of ETAPS) just two weeks ago. This paper details my recent work on compilation with the basic block versioning algorithm, an alternative to tracing. This work was briefly explained in my DConf 2013 presentation, as well as the talk I gave at Strange Loop, which should hopefully be made available online on InfoQ within a few weeks. I unfortunately won't know if my paper makes it into CC 2014 until mid-late December, but in the meantime, while keeping my fingers crossed, I'm already planning the third iteration of the Higgs JIT compiler.

Basic block versioning can compile multiple versions of a given basic block based on context information (e.g. type information) accumulated during compilation. It's a formalized algorithm for tail duplication which allows a compiler to eliminate type tests, hoist type tests out of loops, and duplicate code paths where multiple types are possible at run time. This approach is similar to tracing JIT compilation in that, like tracing, it's preoccupied with accumulating type information along linear sequences of code, and using this accumulated information to specialize code farther along, with the important difference that we're working at a lower granularity, that of basic blocks, instead of traces. I believe that the paper on basic block versioning my advisor and I submitted is an original and interesting contribution, but I'm not stopping there, I already have many improvements planned.

The basic block versioning algorithm as implemented in the current version of Higgs (as available on GitHub) works at the intraprocedural level, and one function at a time. Type information does not currently propagate past function call boundaries. This means that Higgs must do aggressive inlining of function calls in order to eliminate type tests with any amount of success. The runtime library primitives are written as JavaScript functions, which must also be inlined for performance, but inlining, as is currently done in Higgs, is unfortunately rather expensive. The code for most JavaScript primitives must deal with many possible input types, and so primitives end up being relatively large functions. Inlining large functions takes time, but most of the code being inlined is actually useless as most JavaScript operators are not used in a polymorphic way.

The next iteration of Higgs will try to solve this problem by moving away from method-based compilation. The new Higgs JIT will lazily and incrementally compile functions one basic block at a time. Parts of functions that are not executed will never be compiled. I intend to push this approach further, and inline function calls incrementally as well, which means that in many cases, only small parts of functions will actually be inlined. There is significant potential for compilation time savings. There is also the potential for making better inlining and optimization decisions, through smart use of profiling during incremental compilation. Basic blocks, when first compiled, can be made to collect profiling information. These same blocks can then trigger recompilation later to produce more efficient code.

Won't incremental compilation result in a horrible mess of machine code, you ask? I devised a scheme where blocks can be recompiled, garbage collected and relocated so as to make the machine code more efficient. I won't get into all the details just yet, but I've started examining the technical challenges and I'm quite convinced this is possible. Not only is it possible, it will mesh with basic block versioning in very interesting ways, and open up many possibilities. As far as I know, Higgs is going to be the first compiler to feature incremental lazy compilation of functions and incremental inlining (if anyone has references to prior art, please let me know). Even if Higgs isn't the first at this though, the combination of lazy incremental compilation and basic block versioning will make it a truly unique beast.

A special thanks to Marc Feeley, Paul Khuong, Luke Wagner, Benjamin Cerat, Molly Everett and Erinn Di Staulo for helping me review my CC 2014 paper prior to submission.

Enjoyed this post? Share it with your online community!
Share to Hacker News