Approximately 60 days ago, I decided to begin refactoring the Higgs compiler to use an SSA intermediate representation (IR) instead of one based on 3-address form. The work has taken around twice as much time as I had originally anticipated, but is finally complete: the JIT compiler now generates machine code based on the SSA IR directly. Inlining isn't yet implemented in the refactored JIT, but I've had time to implement peephole optimizations on the SSA IR, and code generation improvements which take advantage of SSA. Most of the difficulties encountered were related to register allocation. Changes had to be made to deal with phi nodes and constant values appearing directly as instruction operands. I would say that register allocation, along with garbage allocation, are among the hardest things to debug in a compiler.
I've decided to go with an extremely naive register allocator for the moment. It seems that register allocation simply isn't where the performance bottlenecks are at this stage, there are much more obvious areas to optimize to obtain significant speedups. The good news is that the quality of the code generated by the new JIT is better than before, and the performance is again improved, sometimes by a factor of 2 or more. Since I'm working on a paper to be submitted at a compiler conference shortly, my work within the next 2-3 months will be focused on aspects of Higgs that directly tie into this, such as:
Eliminating known bugs, getting all V8/SunSpider benchmarks to run on Higgs
Implementing inlining in the JIT and achieving performance gains as a result
Implementing a "normal" mode without code replication in the JIT backend, for comparison purposes
Implementing benchmarking and metrics gathering infrastructure to obtain tables of results
In terms of metrics, I already started generating stats such as the number of dynamic type tests executed. The hope here is that the code replication/specialization scheme will help reduce this number significantly. Other metrics will include execution time, compilation time and volume of machine code generated (in bytes). I'm going to need to write a script to extract all these stats and produce averages and tables. On the reliability front, I have to say I've spent several days of work fixing bugs. I feel like Higgs is always becoming more robust and complete, but in the medium and long term, it would really benefit from fuzz testing. If anyone is interested, I could definitely use help in the fuzz-testing and debugging area, as well as in profiling to find performance bottlenecks.
In other news, my collaborator Molly Everett has improved the Higgs FFI library further and started to implement a console library to provide facilities similar to what the console.log function provides in web browsers. I decided to port some simple CSV parsing code I wrote a while back and made it tie into our stdio bindings just to dogfood our current infrastructure a little bit. At this point, Higgs could conceivably be used to do some useful processing on text files. Tom is already talking about trying to write SDL 2.0 bindings for Higgs, which would undoubtedly be very useful (and fun to play with).