The Incremental JIT Lives

December 10th, 2013

It's been about a month since I started refactoring the Higgs JIT to implement a lazy/incremental compilation strategy. The new JIT will combine the basic block versioning scheme I previously implemented with lazy compilation of basic block versions and eventually incremental inlining of function calls. Higgs will also no longer have an interpreter, everything will be JIT-compiled on the first execution. This is a natural extension of basic block versioning which I believe will bring many interesting possibilities, such as the ability to recompile or invalidate code at the level of individual basic blocks. This means, for example, that function call sites, if they are executed often enough, could be recompiled so as to inline the callee, but without recompiling the entire caller function. Other possible uses include making optimistic assumptions about the types of values, such as global variables, and being able to invalidate only the blocks making these assumptions if the said assumptions are violated.

This is a significant refactoring, but it's not the first time I rebuild the JIT, and I've gotten fairly efficient at it. I already have lazy code generation, most of the runtime library, and the Higgs REPL working. Many of the basic unit tests are working, the crucial bits are in place, but there are still many important parts missing (register allocation, garbage collection, exception hanling, and more). Since I want Higgs to eventually be considered a serious and usable compiler, I decided to create a dev-maxime development branch and avoid feature regressions on the master branch. The new JIT will make it to the master branch once all tests and benchmarks work, which should hopefully happen within two months or so.

The most difficult aspect of this work so far has been to come up with a coherent and efficient design for the incremental code generation. I decided to sketch things up and begin experimenting with the parts of the implementation that I thought would be most problematic right away, so that I could know about the problems that might arise early on. Much exploratory programming was involved, but the key design decisions have been made. Machine code is generated in one pass and directly appended to an executable heap. It's generated in multiple small fragments which all contain position-independent code. The fragments may end with branches which can refer to other fragments. Each fragment keeps a closure that knows how to write its final end branch, so that these branches can be rewritten when the fragments are relocated.

In other news, I'm still waiting to know if my paper about basic block versioning is accepted at Compiler Construction 2014, I should have the answer by December 20th. I did get some very positive news since my last blog post, however. I've received (and accepted) an invitation to present Higgs and my research work at the mloc.js conference, which takes place mid-February in Budapest. Like Strange Loop, this conference will be filmed and the videos will be made available online. I've also received an invitation to present at Web Directions Code 2014. I'm very much looking forward to sharing my latest research findings.

My thesis advisor has recently shown more openness to the idea that I could do an industry internship before the end of my PhD, which seems like it would be a good move career wise. This would probably occur around December 2014 or January 2015, after the next round of conference deadlines. I've been approached by people from Apple (LLVM team) and IBM (JIT and dynamic language runtimes). Back at DConf 2013, I also had the opportunity to meet Andrei Alexandrescu and other members of the HipHop VM team. They seemed very interested in my work on basic block versioning and in getting me to come do an internship at Facebook. Most likely, I'll try approaching people at Mozilla and Facebook to see if I could possibly work on either IonMonkey or HipHop VM as part of an internship.

When I started my PhD, I originally wanted to do research in machine learning. Unfortunately, after only a few months, I discovered that field wasn't really for me. I found it to be overly theoretical, mathematical and dry for my taste. My creativity wasn't really there. Instead, I found myself thinking back to my MSc project, and having several ideas about how to improve upon the design of McVM. I decided to follow my passion, change advisors and do a PhD in compiler design instead. Compilers are not really a glamorous CS subfield, but I think this may have turned out to be a good career choice. There are clearly more jobs in compilers than there are compiler experts.