Static vs Dynamic: Why not Both?

November 10th, 2012

Every once in a while, a blog post pops up where someone advocates that static languages are better than dynamic languages, or vice versa. Static language advocates will claim that static languages offer undeniable safety and speed advantages. Those who like dynamic languages will tell you that expressiveness and productivity gains are invaluable. Having programmed in both C++ and JavaScript for several years, I feel that both points of view are correct. Having written both dynamic and static programming language compilers, I also feel that both kinds of languages are really two sides of the same coin.

From my perspective, as a compiler writer, the difference between static and dynamic languages is a simple one. In static languages, variables are assigned a fixed type at compilation time. In dynamic languages, variables can change type over the course of a program's execution. Dynamic languages are not "untyped", they are typed, sometimes even strongly typed, but their types are associated with values rather than variables. This distinction becomes somewhat blurred by the fact that static languages all offer ways to perform dynamic dispatch, and some even offer sum types. Conversely, in most programs written in a dynamic language, most variables will not change type over the execution of a program.

Static languages have definite strengths. As of yet, most dynamic languages are much slower than Java and C. There is also some truth to the idea that it becomes "messy" to maintain programs written in dynamic languages as they grow. This is in part due to often poor tool support. The type annotations present in Java and C make it very easy to implement tools to help with documentation and refactoring. On the other hand, programming in JavaScript feels much more productive than programming in Java, C++, ML or D. Duck typing allows polymorphism without rigid declarations. Object and class literals allow declarative programming on a level templates can't match. Run-time code generation allows easy JIT compilation of custom code.

For the past few months, I've been working on dataflow-based static type analysis for JavaScript. These analyses are essentially a form of type inference that tries to compute the set of possible types a variable (or object property) can have at run-time. This is different from ML-style Hindley-Milner type inference in that flow-based analyses can allow variables to have more than one possible type. The obvious aim of my work, and that of others who have worked on this problem, is to facilitate tool support and allow better optimization of dynamic languages. There's one unfortunate issue with JavaScript and static type analysis though: JavaScript wasn't designed for type inference, and constructs like eval make compile-time type inference problematic at best.

What if, through type inference, you could have all the expressiveness and flexibility of dynamic typing with the speed and robustness of static typing? We have the technology, but making this happen would likely require a new programming language design. Retrofitting type inference on existing dynamic languages such as Python, Ruby and JavaScript has proven to be problematic, largely because these languages were never designed with type analysis in mind. They integrate features that make type analysis and optimization unnecessarily difficult. Contrast this with the ML programming language, which was designed in conjunction with its type analysis. ML doesn't have to deal with constructs which are fundamentally incompatible with type inference: ML is the embodiment of the capabilities of the Hindley-Milner method.

What if we could do the same thing for dynamic languages? What if we designed a dynamic language in conjunction with the capabilities of flow-based points-to type analyses? A language that is dynamically typed, but for which we have an algorithm that allows us to know all possible types statically. What would that look like, and what advantages would it have? I propose that it would be possible to design programming languages which have the strengths typically associated with both static and dynamic languages. Imagine, for example, a language that is similar to Python and JavaScript and which possesses the following features:

No need to declare types for variables or function arguments
New fields can be added to objects dynamically
Variables can change type dynamically, duck-typing is fully supported
Refactoring tools which can determine the possible types for all variables
Documentation tools which take advantage of type inference (e.g. doxygen)
Can be statically compiled and linked directly with C code
Can be interpreted and tinkered with in a shell environment (i.e. REPL)
Type errors are detected at compilation time in most cases (e.g. accessing missing object field)
Performance comparable to that of Java (~50% of the speed of C code in most tasks)

Such a programming language is possible with the knowledge we have today, and I believe there is definitely a large niche for it to fill. Many programmers out there love Python for its flexibility and ease of use. However, Python, due to its lack of performance, is currently unusable in many application realms. Furthermore, dynamic languages are still vastly inferior in terms of tool support. Having built-in support for type inference would make it possible to have competent IDEs with all the capabilities people are used to when working with Eclipse, for example.

Now, I said that some language features, such as eval, make type inference difficult. You might be wondering what kind of unholy sacrifices we'd have to make when designing a dynamic language that is amenable to type analysis. The answer, I believe, is that we wouldn't have to sacrifice much. In its most common form, eval is problematic because it can access any variable in the program: it's simply too unpredictable for type analysis, because it could do almost anything to a program's state. There is a simple fix for this: don't let eval write to program variables! Our new language could simply let eval execute in a restricted environment passed as argument. This is safer, and eliminates the problem without getting rid of the useful construct that is eval.

Other design compromises might have to be made to make type analysis more effective. I've suggested in the past, for example, that it would be easier to infer types in JavaScript if the language had both integer and floating point types. Throwing an exception when missing object fields are accessed (instead of producing the undefined value) would help type analysis as well. Such compromises are actually quite sensible: they make type inference easier without reducing expressiveness or getting in the way of programmers. In fact, throwing exceptions early, for example, actually helps debug programs more effectively!

I believe that often, there is a "right" time for ideas to become mainstream. The time for this idea will come soon. There is a niche to be filled, there is a part of programming language design space we haven't really explored, and we have the technology to make it all happen. I don't know that I have time to create such a programming language myself, but I'm already seeing hints that others are thinking along the same lines that I am. The statically type inferred, high performance dynamic language will happen eventually, and probably sooner than you think.