Basic Block Versioning - My Best Result Yet

For those not familiar with this blog, my PhD research is focused on the optimization of programs written in dynamically typed programming languages, JavaScript (JS) in particular. In JS, every arithmetic operator and every object property access needs to do dynamic dispatch based on the type of its operands. For instance, the addition operator can work on integers, strings and floating point values. It can actually accept values of any type as operands. The types of both inputs are implicitly tested at run time so that the correct behavior can be chosen. Much of these type tests are redundant, because even in a dynamically typed programming language like JavaScript, most variables don't change type over the execution of a program.

Earlier this year, my first paper about Basic Block Versioning (BBV) was accepted at ECOOP 2015. BBV is a JIT code generation technique I've been working on which is very effective for eliminating redundant dynamic type tests. In this first paper, we (my advisor and I) were able to show that our technique eliminates 71% of dynamic type tests across our set of benchmarks, resulting in significant performance improvements. Last week, I submitted a paper about an interprocedural extension to BBV. This extends on the original work we did by generalizing the technique to propagate type information through function calls, about function parameters and return types as well.

The improvements are quite striking. We're now able to eliminate 94.3% of dynamic type tests on average, and we eliminate more than 80% of type tests on every benchmark. To put things in perspective, I decided to compare this result with what's achievable using a static type analysis. I devised a scheme to give me an upper bound on the number of type tests a static analysis could possibly eliminate. First, execute all the benchmarks and record the result of each type tests. Then, re-execute the benchmarks with the type tests that always evaluate to the same result removed. This is equivalent to using a static type analysis with access to "perfect" information about which type tests are going to be redundant. The resuls of this experiment are shown in the graph below:

testcounts
testcounts

I was very pleased when I first saw these results. The "perfect" static analysis eliminates, on average, 91.7% of type tests, which is less than what we achieve with interprocedural BBV. You might be wondering how this is possible, how BBV can possibly eliminate more type tests than what should be an upper bound on the number of type tests that can be eliminated. The main point is that the analysis is just an oracle that tells us whether any given type test is going to be redundant and safe to eliminate or not. In contrast, BBV has the power to selectively duplicate sections of code, which makes it possible to eliminate even more type tests.

The main reason that code duplication (or versioning), is useful, is that it allows BBV to separate out contextual information which wasn't present in the original untransformed program. If you want a simple example, think of a loop where some variable x is an integer in the first iteration, and then becomes a string in every subsequent iteration. A traditional type analysis will see that this variable could be either an integer or a string, and conclude that we don't know what type this variable will have at run time. In contrast, BBV might be able to unroll the first iteration of the loop, and know that it will be integer in this first iteration, and a string in every other iteration, thus eliminating all type tests on the variable x.

There are interesting implications to the results obtained with BBV, in my opinion. I was recently talking to someone at Facebook about HHVM, their virtual machine for PHP and Hack. They were telling me that Hack has gradual typing, and that this information isn't yet used to optimize code in HHVM. Based on the results I got, I would say that there is probably no real need to type-annotate code in order to improve performance. Gradual typing can be useful for documentation, tools and safety, but it doesn't really offer performance advantages. BBV can already eliminate over 94% of type tests on average, and this is only the beginning, there are still many easy ways to further improve upon these results. Alternatively, flipping what I just said on its head, if a JIT compiler can known the types of most variables at code generation time, the type assertions introduced by gradual typing can be implemented at little to no cost.