Code Compactness and Optimization Levels

October 31th, 2011

I've already mentioned how Tachyon is presently pretty slow in terms of compilation speed. It takes about 12s to compile 8000 lines of JavaScript from source to machine code, which is much too slow, I'm sure most JavaScript users would agree. One reason is probably that the implementation of the compiler itself is not as effective as it could be. I already know the backend is not that fast, and I plan to look into that some more. Something else I came to realize, however, is that Tachyon is actually compiling a lot of code.

One of the benchmarks I'm compiling as part of my unit test suite is a piece of Tachyon code that deals with the Foreign Function Interface (FFI). There's this function called initFFI that serves to register C functions into Tachyon so that they can be used by our compiler. It's about 100 lines of code or so. Unfortunately, compiling this very function, and just that one, takes over 10 seconds! I found out that this function was triggering some kind of pathological case in the compiler. Further investigation revealed that the function in question produces a total of over 700 basic blocks in its Control Flow Graph (CFG). Furthermore, during some transformation passes, this number grows up to over 2200 basic blocks. That's one huge wad of code for Tachyon to process, much more than I expected.

Why is this? Well, when I implemented the Abstract Syntax Tree (AST) to Intermediate Representation (IR) translator, I was trying to be clever. I wanted the final code to be fast. In this spirit, I decided to inline the whole sequence of low-level actions that goes on during a JavaScript "new" operation each time the said operator was used. Unfortunately, initFFI has at least 50 "new" expressions. This results in very bloated code, which is slow to compile. Slow to compile, but faster, right? Well, perhaps... But then, as the name implies, initFFI only executes once, and it probably completes its execution in less than 10 milliseconds. Hence, initFFI is probably not worth optimizing, and thus, not worth spending much time compiling.

My conclusion is probably that I need for Tachyon to initially compile code into a form that's less expanded, with as little inlining as possible. Something compact, that will compile fast. Only the functions that run for a long time should be optimized (and recompiled). Clearly, only those are worth spending time on. This is somewhat of a realization (or "duh!" moment), for me. The fact is, most modern Just-In-Time (JIT) compilers have more than one optimization level. Most of them incorporate a fast baseline compiler, and only optimize methods when they've been shown "hot" (they run for a long time, overall). I simply didn't realize how important this was, and that it was relevant for Tachyon.

Marc Feeley, my advisor, inspired me to do a little experiment. I decided to temporarily disable the automatic inlining on much of my primitives, and to not inline the constructor calls with zero parameters (others are inlined). The result is that my unit tests (which are mostly compilation tests) run over twice as fast as before. I know I could remove even more inlining if I look into this some more. This is something I'll need to think a little more about. It seems clear to me though, that my High-Level IR (HIR) should probably be compact, minimal, and not involve any inlining. The standard, default compilation path should be made fast.

The next thing I want to decide is whether or not my HIR will involve special "HIR instructions" that can map to code generators and/or primitive calls, or map directly to primitive calls. Right now, I'm thinking they are quire similar options, but the first one may be more flexible, and more practical to work with in terms of analyses. It's also possible that one HIR instruction will map to many variants of a given primitive.