Skip to content

Basic Block Versioning – My Best Result Yet

September 24, 2015

For those not familiar with this blog, my PhD research is focused on the optimization of programs written in dynamically typed programming languages, JavaScript (JS) in particular. In JS, every arithmetic operator and every object property access needs to do dynamic dispatch based on the type of its operands. For instance, the addition operator can work on integers, strings and floating point values. It can actually accept values of any type as operands. The types of both inputs are implicitly tested at run time so that the correct behavior can be chosen. Much of these type tests are redundant, because even in a dynamically typed programming language like JavaScript, most variables don’t change type over the execution of a program.

Earlier this year, my first paper about Basic Block Versioning (BBV) was accepted at ECOOP 2015. BBV is a JIT code generation technique I’ve been working on which is very effective for eliminating redundant dynamic type tests. In this first paper, we (my advisor and I) were able to show that our technique eliminates 71% of dynamic type tests across our set of benchmarks, resulting in significant performance improvements. Last week, I submitted a paper about an interprocedural extension to BBV. This extends on the original work we did by generalizing the technique to propagate type information through function calls, about function parameters and return types as well.

The improvements are quite striking. We’re now able to eliminate 94.3% of dynamic type tests on average, and we eliminate more than 80% of type tests on every benchmark. To put things in perspective, I decided to compare this result with what’s achievable using a static type analysis. I devised a scheme to give me an upper bound on the number of type tests a static analysis could possibly eliminate. First, execute all the benchmarks and record the result of each type tests. Then, re-execute the benchmarks with the type tests that always evaluate to the same result removed. This is equivalent to using a static type analysis with access to “perfect” information about which type tests are going to be redundant. The resuls of this experiment are shown in the graph below:

testcounts

I was very pleased when I first saw these results. The “perfect” static analysis eliminates, on average, 91.7% of type tests, which is less than what we achieve with interprocedural BBV. You might be wondering how this is possible, how BBV can possibly eliminate more type tests than what should be an upper bound on the number of type tests that can be eliminated. The main point is that the analysis is just an oracle that tells us whether any given type test is going to be redundant and safe to eliminate or not. In contrast, BBV has the power to selectively duplicate sections of code, which makes it possible to eliminate even more type tests.

The main reason that code duplication (or versioning), is useful, is that it allows BBV to separate out contextual information which wasn’t present in the original untransformed program. If you want a simple example, think of a loop where some variable x is an integer in the first iteration, and then becomes a string in every subsequent iteration. A traditional type analysis will see that this variable could be either an integer or a string, and conclude that we don’t know what type this variable will have at run time. In contrast, BBV might be able to unroll the first iteration of the loop, and know that it will be integer in this first iteration, and a string in every other iteration, thus eliminating all type tests on the variable x.

There are interesting implications to the results obtained with BBV, in my opinion. I was recently talking to someone at Facebook about HHVM, their virtual machine for PHP and Hack. They were telling me that Hack has gradual typing, and that this information isn’t yet used to optimize code in HHVM. Based on the results I got, I would say that there is probably no real need to type-annotate code in order to improve performance. Gradual typing can be useful for documentation, tools and safety, but it doesn’t really offer performance advantages. BBV can already eliminate over 94% of type tests on average, and this is only the beginning, there are still many easy ways to further improve upon these results. Alternatively, flipping what I just said on its head, if a JIT compiler can known the types of most variables at code generation time, the type assertions introduced by gradual typing can be implemented at little to no cost.

From → Grad studies, Higgs

8 Comments
  1. Anon permalink

    You should take a look at Tcl/Tk and the Tcl_Obj infrastructure that was built into Tcl starting with version 8.0 many many years ago. It shares many similarities with your research to the extent you describe the research above.

  2. anon permalink

    > Is tcl/tk JIT-compiled?

    Yes, into a bytecode representation that is executed by the execution engine.

    > Can you point me to some specific source code?

    You can find the source distro bundles from this page http://www.tcl.tk/software/tcltk/, however as far as pointing you to a specific file, that I can not do. Several of the core maintainers do hang out in the Usenet group comp.lang.tcl and so if you were to post specific source questions there they would likely respond with more useful answers than I can.

  3. Jules permalink

    Nice result. The biggest performance problem of type tags isn’t the checking, but the tagging and pointer chasing. So there might be bigger performance gains possible yet. Do you have any ideas on how to do this with BBV?

    • My implementation does unboxing of local variables automatically and stores object properties unboxed. Pointer-chasing, I think this is an unsolved problem in dynamic languages. You would need something like automatic object inlining. People have worked on it, there are a few publications on the topic, but AFAIK, no major compiler does it. It’s a problem people have paid no attention to.

Trackbacks & Pingbacks

  1. Have Static Languages Won? | Pointers Gone Wild
  2. Dualismus hardwaru a softwaru, strojů a virtuálních strojů | Funkcionálně.cz
  3. ZetaVM, my new compiler project | Pointers Gone Wild

Leave a comment