New x86 Backend and... JavaScript Macros?

September 28th, 2011

New Backend

I've finally completed the rewrite of the Tachyon backend. This is the part of the compiler that takes our SSA form intermediate representation and turns it into machine code. It currently supports x86 32 and 64 bit. I decided to go for a rewrite for two reasons. The first is that there were a number of issues that bugged me about the previous backend (compilation speed, performance of the code, bugs, lack of features). The second is that I wanted to make sure that I fully understood everything that went into the backend, seeing as I will most likely have to heavily modify it during the course of my thesis. So far I'm pleased with the result. The new design seems fairly robust and extensible and I haven't encountered too many bugs.

The backend is written in JavaScript from the ground up and only uses JS extensions to allocate memory blocks and write integer values into them. I wrote a custom x86 assembler based on a table of instructions that describes each supported encoding for a given instruction. It's essentially the same tables that are found in the Intel/AMD CPU manuals. This makes it very quick to add support for new instructions (just add a few lines for its encodings!). I made that choice mostly because the x86 instruction set is very ad-hoc, and only keeps being extended.

The register allocation is currently done greedily and on-the-fly (interleaved with the instruction selection). I made this choice for JIT compilation speed and to keep the compiler simple. If register allocation turns out to pose a speed problem, I'm fairly confident that I can improve it by tweaking the allocation heuristics. The backend has also been designed in such a way that it would be possible to replace the register allocator without breaking the rest of the backend.

Finally, the backend incorporates a peephole optimizer to try and improve the generated code. Instead of writing machine code bytes directly while the instruction selection occurs, instruction objects are queued into a list so that the instruction sequence can be optimized in a separate pass. This allows the instruction selection to remain fairly simple (no need to try and generate optimal instruction sequences in all cases). Suboptimal instruction sequences can be replaced by more efficient ones, redundant moves and other instructions can be removed. This pass takes only about 5% of the backend time so far, and removes a lot of bloat.

Performance Issues & Macros

As I was testing the backend and ironing out a few bugs, it became obvious that my new backend was much slower than the old one. Compiling the Tachyon standard library (~7000 lines of code) took over 2 minutes in total, which was unacceptable. I started looking into the cause, and it rapidly became obvious. I used a lot of assertions within the source code to help with debugging.

These have the form:

assert(condition_expr, error_string);

For example:

assert(value instanceof Function, 'value is not a function: ' + value);

The problem is that these assertions were implemented in our system as function calls, and as such, the error string was always evaluated, no matter whether the condition was true or false. The result is that some (sometimes very long) error strings were generated all over the place, even if they were never going to be printed. In the example above, a string representation of value would be produced and a string concatenation would occur each time the assertion was evaluated.

Replacing the worst offending assertions of the backend by if statements shortened the compilation time from above 2 minutes to about 12 seconds. I suspect it could be reduced further by eliminating more of these. Getting rid of assertions isn't really the solution I'd like, however. The conclusion I came to is that assertions really should be expressed not as function calls, but as macros.

The previous example assertion, for instance, should be automatically replaced by:

if ((value instanceof Function) === false)
    error('value is not a function: ' + value);

Realistically, it would be fairly easy to implement a specialized AST transformation for assertions which replaces assert function calls by inlined code of the form above. However, Tachyon also has a logging system that has calls such as "log.debug(string_expr)" which only print in debug mode. For performance reasons, those should ideally also be expressed as macros. Hence, I came to the conclusion that it would make sense to implement some kind of macro system in Tachyon.

Right now, I'm trying to decide whether I want something simple like C's macro system, which does text-based replacement, or a system operating at the AST level where macros are pseudo-functions where the argument variables are replaced by the argument expressions. One annoying detail, though, is that as long as Tachyon runs under V8, I will have to implement the same macro system into both Tachyon and V8 if I want the code to work the same in both.