What I'd like to Know about JavaScript Programs

October 31th, 2011

The final goal of my thesis is to find ways make programs written in dynamic programming languages (like JavaScript) run faster. One way to do this is to exploit regularities in the behavior of the said programs. Dynamic languages appear to make it possible to redefine (almost) everything while a program is running. Programmers, can, for example, redefine global variables (including function bindings), add and remove properties to objects, or even call the (in)famous eval function to redefine the value of local variables. This makes analysis and optimization of such programs difficult.

It makes sense to think however, that the dynamic features of JavaScript are not used in an unrestricted way. Programs are implementations of algorithms. They manipulate structured data in a way that makes sense to programmers. As such, the programs themselves should exhibit some predictable, regular structure in their behavior. An optimizing Just-In-Time (JIT) compiler can exploit this in multiple ways, such as predicting what a program will do next based on past behavior, and restricting the generality of the code it generates based on the extend of the dynamic behaviors exhibited by programs.

At the beginning of my Ph.D., I worked with Marc Feeley on instrumenting the WebKit JavaScript interpreter (JavaScriptCore) to gather some information about the dynamic behavior of programs. The goal was to eventually use this information to better guide our attempts at optimizing JavaScript code. This effort showed promising results, but was unfortunately abandoned mid-way because of technical difficulties: JavaScriptCore is very hard to instrument thoroughly. There are many things we simply couldn't do. There was much data we didn't have access to.

Thanks to a combined effort by Marc Feeley and Bruno Dufour, we now have a system that allows us to re-write JavaScript souce code directly in order to instrument it. Combined with a web proxy system, we believe that we may finally be able to get the information we want. This tool will (hopefully) be working reasonably soon, and so I have decided to begin gathering a list of metrics to profile in real-world JavaScript applications. These will be useful in guiding my optimization efforts.

I would like to know:

What is the distribution of the length (in lines of code) of JS functions?
How often are program and library functions executed?
What proportion of functions are never executed?
What proportion are executed only once?
More than K times (with K ~ 10 to 100)?
What proportion of functions have loops?
What is the average loop nesting depth, for functions that have loops?
What is the distribution of the number of loop iterations?
What types are low-level operations usually applied to?
What is the percentage of multiplications/additions on int64 int32, int30, uint32, floating-point values?
How often is the addition operator used for arithmetic vs. string concatenation?
How often are bitwise operators applied to non-integer values?
The percentage of branches always going the same way.
How stable are the types of variables?
How stable are function arguments, function return types?
How stable are function return types, for a given fixed input type string?
How stable are local variable types? What if argument types are fixed?
How stable are global variable and closure variable types?
How often do global variable types change?
How often are global functions redefined (if ever?)
How are objects typically used?
How many objects are allocated per function call, on average?
How many fields do objects end up having?
What proportion of allocated objects do not escape their allocating function?
What proportion property accesses use dynamically generated string as keys?
The frequencies of object property read/write/deletes per allocated object.
How is eval typically used?
The frequency of eval calls.
The frequency of eval calls redefining local/global variables.

I should specify that when I say "type" I don't just mean what would be returned by the JavaScript "typeof" operator. I mean the low-level type (number, array, object, function, string, boolean, null, undefined, etc.) as well as other information we may find useful in an optimizing compiler, such as whether a given value is a specific constant, whether numbers are integers, floating-point, and information about the types of properties, in the case of objects. The goal is to capture information about the nature of values which may be useful to an optimizing compiler.