Skip to content

Musings on AOT, JIT and Language Design

November 28, 2015

Lately, I’ve been pondering the strengths and weaknesses of static and dynamic languages. In particular, the reaction of the online programming community to Elben Shira’s post The End of Dynamic Languages made me realize that many (most?) programmers out there seem to prefer static languages. This seems to be largely because static languages usually come with better IDE and tool support, as well as static guarantees which programmers value highly (and rightly so).

I’m slowly working on a design for a programming language of my own, and I had a “wait a minute, let’s take a step back” kind of moment. Why? Because the language I was working on was going to be extremely dynamic. It was going to be JIT compiled, dynamically typed, with an extremely powerful macro system based on the ability to hijack and modify the parser while a program is being parsed. I was also considering using a self-modifying Intermediate Representation (IR) as part of the design of the JIT compiler.

Elben’s post made me take a step back because it made me realize that the design of my ultra-dynamic language would make it near-impossible to have much tool support. If you can hijack the parsing of a program while it’s being parsed, and alter the parser using custom code you wrote, then good luck running any kind of static analysis. It’s just not going to work.

I saw that I was a being myopic. I use a bare-bones text editor, no tools, and I’ve only used a debugger a handful of times in the last few years. That’s the way I program, but it’s not the way most people program. I can get away with this because the projects I’ve worked on have all been under 100K lines of code. Designing a language which makes any kind of static analysis, refactoring-assistance or autocompletion near-impossible is a sure way to guarantee the language won’t gain adoption outside of hobbyist circles.

Then, there’s the issue of JIT compilation. I very much like JIT compilers because that’s where my expertise lies, I’ve written my own JIT as part of my PhD thesis. Still, I have to keep an open mind and ask myself what’s best for my language. JIT compilers are fundamentally more powerful in terms of optimization capabilities than AOT (Ahead-Of-Time or static) compilers, because they can adapt the way they optimize code based on the behavior of running programs. They’re great for dynamic languages.

Unfortunately, it seems that the world, at the moment at least, is stepping away from JIT compilation. In order to support mobile platforms or compile to WebAssembly, it seems to me that JIT compilation is impractical. Why? Because it requires generating machine code on the fly. At the moment, it’s unrealistic to just plop a custom binary with self-modifying machine code on the Apple or Android app stores.

I’m thinking I might just design my language so that it compiles to plain old C. This would make it possible to leverage the C compilers that already exist for every platform out there. Rest assured though, this language will be dynamically typed ;)

26 Comments
  1. > JIT compilers are fundamentally more powerful in terms of optimization
    > capabilities than AOT (Ahead-Of-Time or static) compilers

    I don’t think this is true. First of all, you have a lot of information lost at
    the point you compiled to a bytecode(like typing information and the high-level
    code which is better for some type of optimizations). Second, you simply can’t
    afford some of the optimizations because you’re actually doing compilation
    during runtime.

    I think if we consider all the optimization passes that, for example, a
    functional language compiler does, we could come up with a list of
    optimizations that are either impossible or very hard to do using JIT.

    For example, imagine common subexpression elimination(CSE) done in some
    functional programming language compilers. This is simply impossible do to in a
    JIT compiler. Imagine an expensive `f(x)` appears twice in a function. How
    could you optimize this program in runtime to run `f(x)` only once?

    As a second example, imagine partial evaluation(PE). It’s certainly an
    optimization, and sometimes it’s a very powerful one that does great things(see
    Futamura projections). But it doesn’t even make sense to talk about doing it in
    runtime in a JIT compiler, because by definition, you should be doing the work
    in compile time to leave less work in runtime.

    IMO there are certainly optimizations that JIT does better(or maybe only a JIT
    can do) but I don’t think it’s “more powerful” fundamentally.

    • 1. JIT compilers do not need to work from bytecode.

      2. Multi-stage JIT architectures address your concerns about cost.

      3. JIT compilers have access to more information about your program.

      4. You’re entirely mistaken, partial evaluation and CSE are already done in JIT compilers.

      • osa1 permalink

        I’m curious, can you show me some papers/JIT implementations that

        – Don’t work from bytecode
        – Are multi-stage
        – Do partial evaluation and CSE

        ?

        • The JVM builds its own IR from bytecode, doesn’t work with bytecode internally, is multi-stage, and does partial evaluation and CSE. V8 as well. They’re both very sophisticated compilers that do quite a lot of work at the higher optimization levels.

        • Ryan permalink

          Both IonMonkey (firefox) and v8 (chrome) perform common sub-expression elimination (actually global value numbering, which is more sophisticated), as well as loop-invariant code motion, dead code elimination, bounds check eliminations, and many other optimizations.

    • It’s obviously nonsense to say that it’s impossible for a JIT to perform the same optimisations as an AOT. If you give them the same input, why on earth would they not be able to achieve the same result?

      You say ‘you’ve lost information in the byte code’ – well then don’t throw that information away when you create bytecode – encode it somehow, or operate on ASTs instead.

      You say you don’t have time to do the same optimisations. Why is time limited? If you have a program running for days you can spend hours compiling if you really wanted to. You’d be up and running quicker than an AOT which took hours as well.

      Why on earth would CSE be impossible? If there’s an algorithm that works for AOT, do you think it suddenly stops working in JIT?

      Regarding PE – some JITs are built using PE (see Truffle) so again this is obviously nonsense.

      I would argue JITs are fundamentally more powerful, as they can do everything an AOT can do, plus they have accurate profiling information.

  2. Brian permalink

    Sometimes I do worry about academics! In the real world we need languages that engineers/programers can use reliably and safely. When I first saw dynamic language remember thinking hope these are never used to control anything dangerous!

    Just look at JavaScript – and how bugs are continuously being found in real code due closures. A crazy concept for a practical language, you never hear about the advantages of closures just the bugs they cause (wonder why!).

    If it its not clear what a language feature or construct does and I mean crystal clear – then its a bad, very bad feature to let loose on the world of human programers!

    Should add static languages also suffer from dysfunctional and dangerous features but they are usually more predictable and detectable with an IDE\compiler to tell you off!

    My motto is ‘If its not simple, its stupid!’
    :)

    • Not sure why you’re bashing on closures. Almost every language has them now, they’re pretty much universally accepted as a good thing.

      • Brian permalink

        Just an example – perhaps its the way its done in JavaScript that makes it one of the immediate things to look for when debugging someone else’s code!

        Not everyone :) Its often the way they are implemented in the language that is the problem, A language should always be clear in what its doing, Closures in JavaScript are a minefield for the unwary, but JavaScript is a battlefield of bad design anyway!

        Security might also be a problem with dynamic languages, they offer greater opportunities for hacking, not only is the shipped code vulnerable (static is as well) but the software mechanism for running the dynamic language is an additional attack vector or problems arise because its just been updated etc.

        Dynamic language may well have their place, but static ones, at least for the moment, are safer.

  3. Randall permalink

    Maybe the reputation that dynamic languages can’t have tooling is like the old perception that they’d always be slow–that is, maybe it just reflects the current state of the art, and research and practical work could radically change it with time. Not that that’s a direction you want to go (esp. given IDE-ish tools aren’t scratching an itch for you personally), but maybe it is for someone(s).

    Maybe there’re approaches that can let tooling get further than it has, sort of how “make educated guesses, then be ready to deoptimize” let JITs do trickier things. Maybe you partly run the program to get some information and do analysis or use traces from past executions to guess the rest. Or when the programmer patches the guts of the language, they can also patch the analyzer used by autocompleters, etc. to teach it about the modified language. Maybe there’re ways tooling can work with humans to point out potential errors or refactor even when the human will sometimes have to fix its guesses. I know some dynamic-language tooling exists–like, I work in Python-land, and I know PyDev and Sourcegraph’s srclib guess some types–but there must be more general/powerful techniques out there waiting for folks with the cleverness and elbow grease to discover and implement them.

    (On the flipside, bringing some of the tuning JITs do to mostly-AOT-compiled code also fascinates me. Like, my AOT-compiled, mostly-statically-typed code would surely benefit if it could “realize” at runtime that a dynamic callsite just jumped to the same code a million times in a row, or if it could use runtime stats to guess the right initial size for a variable-length array, or many other such things. Part of the gap between AOT and JIT exists for hard-to-surmount practical reasons (like the constraints on mobile and WebAssembly you point out) but maybe part of it is historical and could someday change.)

    Apologies if I wrote something silly here or ignored obvious existing work. It’s clearly optimism and caffeine writing here, not expertise.

  4. ProgramSafe permalink

    Strange. If I understand the collective of programmers preferring static languages to dynamic ones is because they can be trusted more.

    “More tools” looks incorrect. Java (dynamic) has a lot of tools. Rust (static) has very little tools.
    Tooling is more a function of language age and adoption.

  5. I wrote in JavaScript for a long time and JavaScript people mostly don’t use IDEs. As I come to the world of Clojure, I found with a nrepl server running, an IDE would really be helpful, even simple editors with basic features of IDEs will also be useful. It’s slowly changing my mind.

    And it made me wonder why static typed languages can be analyzed, but why not dynamic typed languages. My friend told me Visual Studio supported JavaScript analyzing(His a C# and JavaScript Programmer). Is that solution too expensive so no one is copy that solution?

    I’m working on a small project called Cirru(http://cirru.org), which aims to create a very simple syntax and create structured graphic editors to help us gather informations from AST, rather than text. I hope dynamic languages can be more powerful to use.

    • Brian permalink

      ‘ static typed languages can be analyzed, but why not dynamic typed languages.’
      The answer is yes – its just harder, and complicated by the fact that the dynamic engine, JIT or whatever might be different to the one doing the analysing, but certainly it can analysed in terms of the dynamic language itself. For example in JavaScript there is no good reason why closures could not be highlighted or issue a warning in an IDE when there is a chance they are having an unintended side effect.

      Not using an IDE for any language is basically a crazy way of developing software, but guess some of us software folk are just that :)

      • How about Clojure? It keeps side effects in a limited area, can we analyze Clojure code as we do in Java? Or do we have to make another language which is 99% similar to Haskell and then we can finally make that happen?

        Personally I will never be happy to build UI in the way PureScript is writing DOM, since that many keystrokes will slow down our work.

  6. While I use both kinds of languages and I believe that both AOT and JIT languages have their places, this sentence raises another concern for me:

    > Because it requires generating machine code on the fly.

    This makes my security side very uncomfortable. From an analysis perspective, this brings up all kinds of scary edge cases that are hard to handle.

    • Brian permalink

      and quite rightly so ! Intercepting the source code or intermediary language is the first weakness, followed by not knowing the status of the software running the language or JIT compiler.

      In some situations it might not matter, but they are becoming fewer and fewer!

    • Randall permalink

      I mean, JITs are hard, but so are AOTs; optimizing compilers have scary edge cases in general.

      See security dude Dan Bernstein (@hashbreaker)’s recent Twitter rant on how C compilers aggressively optimize code that invokes undefined behavior for an entirely different sort of scary optimizer edge case. Computers are hard!

  7. Maxime, with all your knowledge etc. how about “joining” an existing language to push it forward with own ideas instead of starting from scratch (which of course gives you all the freedom but as you write, there is not a high chance that it will become mainstream).

    However, in the dynamic languages field I like Rebol (it has a lot of rough edges etc.) for it’s beauty and cool concepts. The Rebol-3 interpreter is open-source, not a huge code-base, works but still a lot can be done. etc.

    Maybe you give it a try, take a look at the code base. It would benefit a lot from your knowledge.

    • Randall permalink

      Freedom to try new stuff is a big deal when trying new stuff is your job as a researcher. I’d trust the expert to make good choices. :)

  8. TempleOS is 125,000 lines of code. I wrote all of it. I wrote a 20,000 line of code x86_64 compiler/assembler/unassembler that operates JIT and AOT. There is no linker. there is no make tool. To understand, the user edits a file in the IDE and presses F5. There are no OBJ and no EXE files. the language is called HolyC. It is a dialect that has been improved.

  9. Have you look at the Nim programming language at all? It compiles to C, and seems to benefit a lot from doing so.

  10. PuercoPop permalink

    Static Analysis is not the only way to get good tooling. Instead one could leverage reflection as Smalltalk does. Hijacking the parser seems useful.

  11. David permalink

    Your dynamic JIT compiled language with macros sounds a lot like Forth

Trackbacks & Pingbacks

  1. Building a Small-Scale Prototype | Pointers Gone Wild

Leave a comment