Higgs: My New Tracing JIT for JavaScript

December 9th, 2012

A few months ago, I made the hard decision to change the direction of my PhD studies. I felt like I'd gotten to a point where I was stalled in my work and the goals of my thesis project had become somewhat blurred. I was experiencing the PhD blues. Something needed to be done. Rest assured, I'm still doing compiler research, I'm still studying dynamic programming languages, and I'm still working with JS (JavaScript). I'm also still part of the same research group I was previously working with. My situation isn't all that different.

Two things did change, however. One is that I'm now being co-supervised by professors Bruno Dufour and Marc Feeley. The other is that I decided to begin implementing a new dynamic language compiler and give my research a more precise direction. This new compiler, named Higgs, will have a simpler design than that of Tachyon (my previous research project) and will make use of some of the key insights I acquired during the last two years. This is not a complete rewrite: I've been enthusiastically reusing code from Tachyon wherever possible, and I'm fairly confident that Higgs, now over two months in development, will reach a level of JS support better than that of Tachyon just in time for the new year.

Higgs is mostly written in D, a systems programming language which resembles C++, but offers key advantages such as an even more powerful template system, garbage collection and basic type inference. As with Tachyon, most of the JS runtime and standard library are written in (extended) JS. Unlike Tachyon, Higgs is an interpreter which will soon be complemented by a tracing JIT compiler and a type profiling system. With Higgs, I intend to make some interesting technological bets and push concepts such as dynamic type-driven optimization as well as lazy compilation to their logical extremes.

I made the choice to use trace compilation in Higgs for a multitude of reasons. Among the most important ones is simplicity. Interpreters make it easy to add support for new features. It's also fairly easy to incrementally extend an interpreter with a simple tracing compiler. There is no need to have compiler support for all language features from the start; the tracing JIT can support compiling only a limited subset of opcodes in traces at first. Traces also have this interesting property that they can bail out at various intermediate points and jump into the interpreter. Presumably, in a compiler which uses type information to optimize traces, it would be very easy to decide that a trace needs to be invalidated to undo some optimization, and have execution fall back to the interpreter. This seems simpler than techniques such as on-stack replacement.

Trace compilation might also be seen as an extreme form of lazy compilation. Traditionally, static language compilers would compile and optimize all methods of a program and all methods of the standard library at once. In a JIT compiler, this is undesirable because compilation time usually overlaps with execution time. As such, it's useful to at least avoid compiling functions that are never called. If you have an interpreter available, you can potentially save more time by only compiling functions that run for a significant amount of time ("hot" functions), as these are usually an even smaller amount of code. In a tracing JIT, we're not even compiling whole methods, we're compiling hot paths through code. We can lazily compile only parts of functions. The Dalvik VM team from Google claims that in some cases, hot traces can be as little as 2% of a program's code.

I originally intended to design the Higgs interpreter to have high-level opcodes that would directly mimic JS semantics. There would have been opcodes to read and write object properties, others that implement JS arithmetic operators, and so on. This seemed like a good idea at first, until I realized that this would imply a large amount of feature duplication between the interpreter and the tracing JIT, as both would have to implement all of JS's core semantics. Furthermore, the JS operators would be impractical to directly compile into traces, as their behavior involves many special cases, which implies several conditional branches.

I decided, instead, to go with an approach similar to that of the Tamarin tracing JIT from Adobe and have my interpreter implement higher-level operations in terms of low-level instructions. The opcodes supported by my interpreter are close to the raw functionality provided by modern CPUs (e.g. direct integer arithmetic). The JS operators are implemented in an extended dialect of JS that exposes the low-level semantics available in the interpreter. This allows the interpreter to remain simple, and will allow the tracing JIT to trace through JS primitives like it would through any other piece of code.

There is one fairly obvious downside to my interpreter design: it's fairly slow. I chose to go with a design that's simple, straightforward and easy to build on. I also chose low-level opcodes, which means that a simple JS operator can cause the interpreter to execute several function calls and many instructions. The common wisdom in the tracing JIT world is that you need a fast interpreter because you're going to fall back to interpreted code often. I decided to make a different bet. My tracing JIT compiler will eventually compile and link any piece of code that is run a sufficient number of times. This means that in the limit, the interpreter's speed becomes irrelevant because any piece of code that is still being repeatedly run will have been compiled.

The practical implication of this is that Higgs probably won't win against any JIT on the SunSpider benchmarks, some of which only run for a few milliseconds. Major browser vendors have poured millions upon millions of dollars and countless man-hours into optimizing every part of their JS engines for these benchmarks, and there's probably little hope I could single-handedly win in that race anyway. Instead, I intend to aim at optimizing longer-running programs, and focus specifically on aspects of JS code mainstream JITs have more difficulty tackling. I can imagine a compiler like Higgs being very effective at optimizing server-side JS code, for example.

At this point, you might be wondering what makes Higgs an academic research project. There have, after all, been several papers published about tracing JITs. The concept is no longer new, and Mozilla actually seems to be moving back to a method-based JIT for Firefox. What makes Higgs interesting is the special sauce I will be adding to the mix. The third component of Higgs will be a type monitoring system that aims to extract near-perfect type information out of running programs. When compiling traces, the JIT compiler will make use of this oracle to produce highly optimized code with far fewer guards than would otherwise be possible. I aim to begin development of this very important part of Higgs as soon as next month. Stay tuned!

Enjoyed this post? Share it with your online community!
Share to Hacker News