Type Analysis for JavaScript

December 12th, 2011

A few weeks ago, I've begun working on a type inference analysis for JavaScript based on the forward dataflow paradigm. The goal of this analysis is to gather information about the types of variables and object fields in JavaScript programs so that they can be better optimized. More specifically, the analysis should allow a JavaScript compiler to obtain some guarantees about the behavior of programs to compile, allowing it to generate more specialized code.

When compiling JavaScript code, a naive JavaScript compiler has to assume that the + operator can be either addition or string concatenation, and produce code that can handle both cases, with the associated dynamic dispatch overhead. This is rather problematic, because all JavaScript operators have multiple cases to handle based on the input types. The results is large amounts of inefficient code. If we have guarantees about the input types of operators, however, we can potentially generate (much) more optimized code. Optimizing dynamic languages based on type inference is the focus of my Ph.D. thesis.

I recently implemented a prototype analysis that is partially flow-sensitive. The results were fairy promising, at least on simple programs. The analysis was able to successfully infer the types of most variables and object fields, and seemed to run at an acceptable speed. Another nice thing is that the analysis can be run incrementally, and doesn't need to analyze parts of programs that can be shown never to run. The main weakness so far, however, is that my analysis is still relatively weak when it comes to inferring that properties must definitely exist on objects. It also doesn't benefit very much from aliasing information that seems obvious when looking at the source code of the programs being analyzed... And my instinct tells me that if it's so plain and obvious, there should be a way to design an analysis that can infer it.

Following a discussion with professor Bruno Dufour, I've decided to implement a second prototype of my analysis that will be fully flow-sensitive, and based on the points-to analysis paradigm. Such analyses are typically more costly in terms of execution time and memory usage, but I'm hoping to find simplifying assumptions that can help make the performance acceptable. In any case, a flow-sensitive analysis should provide me with some sort of upper bound on the precision of the type information that can be inferred statically, and give me more insight into how to design more precise and efficient analyses in the future.

In other news, I've started working on Tachyon's Garbage Collector (GC). This will be a simple stop-and-copy semi-space collector using Cheney's algorithm. I'm implementing the GC itself in Tachyon's extended JavaScript dialect, as opposed to writing it in C, as I feel this will integrate better with the rest of the system. So far, I've written code to traverse objects in memory and to traverse the stack from top to bottom. The main component missing is traversal of compiled functions, which I should be implementing this week. I'm hoping all this won't be too buggy once I integrate it all together. Rumors are GCs are very hard to debug. I'll be finding out soon enough.