Skip to content

My Experience with the ESP8266 – Making an LED Strip I can Control from my Shell

IoT is one of those buzzwords that’s been thrown around so much that it’s become largely synonymous with disappointing marketing hype. Still, home automation, in principle at least, has a lot of potential. The ESP8266 chip came out about two years ago, and it’s been drawing a lot of interest from the “maker” community. This inexpensive chip, for those who don’t know, incorporates a wifi module, flash memory, and a small 32-bit Xtensa core in a tiny package. This has had many people excited, because it means all kinds of electronic projects can be connected to wifi networks for only a few dollars in hardware cost.

I’ve known about the ESP8266 for a while, but until now, it wasn’t so interesting. Early versions of the chip had only a handful of I/O pins. You also needed to install some clunky SDK provided by the vendor to program it, and the instructions were not straightforward. Think tons of dependencies and lots of boilerplate code. Thankfully, this isn’t the case anymore. It’s now possible to get a NodeMCU module on eBay for less than $3.50 shipped, or $7 on Amazon. This module has many I/O pins, its own 3.3V voltage regulator, its own USB interface for programming, and best of all, is programmable with the Arduino IDE.

I’ve recently completed two projects with the ESP8266. One of them is a wifi-enabled power outlet that can be remotely switched on and off. The other is an RGB LED strip whose color can be changed remotely. I work with a Linux machine at home, and needed to update my udev rules in order for the USB interface of my NodeMCU modules to be recognized so that I could program them. Besides that, the whole process has been almost seamless.

holderThe components I used to interface a NodeMCU board with a 12V RGB LED strip

There is a simple “Hello World!” web server example that comes with the Arduino core package, and this example is less than 100 lines long. Making a wifi-enabled LED strip, on the software side, was a simple matter of parsing web request arguments for red, green and blue color values, and translating these into PWM intensity values to control output pins. Hardware-wise, I connected three 2N2222 transistors to the D0-D2 output pins, which are able to handle the current required to drive my one meter LED strip. The real beauty of it though, is that I can control the LED strip with shell commands, by issuing HTTP requests, which opens up the realm of scripting home automation:

# Make it red!
wget -O-


I intend to play with more of these in the future. There are now LED strips with individually-addressable LEDs, which seems like a lot of fun. I would also like to make a box with temperature, light and motion sensors that I can query remotely. If you’re interested in working with the ESP8266, I personally recommend buying a NodeMCU instead of a bare module. It will make your life much, much simpler.

Silicon Valley, Six Months Later

Maybe you’ve been wondering why I haven’t posted on this blog in over four months. The main reason is that my energy has been going elsewhere. In February, I defended my PhD thesis and relocated to the South SF Bay area (aka Silicon Valley) to work in tech. Most of my energy, since then, has been spent adapting to a new job and settling into a new environment. To be completely honest, it’s been difficult. I defended my thesis, prepared for an international move and started a new job all within one month. Hindsight is 20/20: if I had to do it over, I would take two or three months off after completing my degree to give myself a buffer to complete this transition.

Six months later, I have mixed feelings. I’m not sure how I feel about Silicon Valley. I picked an apartment in the South SF Bay because I wanted to be close to work. I’m able to bike to work, which is nice. However, as someone who grew up and lived all her life in a big city, the area feels very suburban and well, boring, to me. In all likelihood, part of the problem is that I don’t have many friends here. Once the work day is over, I often feel very lonely. It might sound cheesy, but this is probably what I regret most about leaving my hometown: leaving my friends and family behind.

I try to look at the bright side: I’m happy with my job. I like my boss and my coworkers. People here work reasonable hours and the work is challenging. I have my own place for the first time in my life. It’s a big apartment in a quiet neighborhood, it’s a comfortable place to live. Silicon Valley is a huge tech hub. It’s a good opportunity for me to learn a lot. Because I don’t go out much these days, I’ve actually made a decent amount of progress on my programming language project in my spare time. I also installed an electric motor on my bike and completed some electronic projects. These things give me some satisfaction. Still, loneliness is hard, and I don’t quite feel at home here yet.


Optimizing Ray Marching through Partial Evaluation

I love ray tracing as a 3D rendering technique. I love it because in its purest form, it’s very elegant and beautiful. It’s a simulation of light and photons flowing through an environment. Because of its physically-based nature, ray tracing can produce very realistic images from very simple code. You can render impressive scenes with just a few hundred lines of C. This is in stark contrast with state of the art polygon rasterization techniques used in modern 3D engines, which make use of a huge amount of very complex hacks to get something that tries to look somewhat realistic.

Traditional Whitted-style ray tracing is done with intersection tests, that is, the scene is rendered by testing if rays coming out of the camera and going into the scene intersect with various objects. With this style of rendering, in order to be able to render a 3D object, you need to be able to write a piece of code that checks whether or not a given ray intersects with the object, and if so, where the nearest intersection occurs (there might be multiple). This is relatively straightforward if you want to render something like a sphere, but it quickly gets more complicated when rendering more complex parametric surfaces.

Recently, I fell in love when I discovered a ray tracing technique that in many ways is even simpler yet more powerful than classical ray tracing. The technique, known as ray marching, or sphere tracing, makes it possible to render a scene that is directly defined by a Signed Distance Function (SDF). This is a function that, given a point in space, returns the closest distance to any point that is part of the scene. The distance is positive if the point is outside of scene objects, zero at the boundary, and negative inside scene objects.

The beauty of this is that SDFs are extremely simple to define. For instance, the signed distance from a point to a sphere is simply the distance between the point and the center of the sphere, minus the sphere’s radius. Better yet, SDFs can be combined together in very simple ways as well. You can produce the union of two objects by computing the minimum of both of their SDFs, and the intersection by computing the maximum. You can also do funky things such as infinitely repeating objects using modulo operators. SDFs truly have a lot of expressive power due to their combinatorial nature.

The demoscene group Mercury has produced some amazing demos of 3D animations based on the rendering of SDFs on the GPU. These demos are entirely procedural and fit in a tiny 64 kilobytes! The Timeless demo, in particular, showcases the kinds of spatial transformations and deformations of space that can be done when your scene is entirely defined by a mathematical function, as opposed to polygons. There are many interesting SDF code examples on ShaderToy you’re interested in playing with them.

Hopefully, by this point, I’ve convinced you that SDFs and ray marching are insanely cool. It’s also pretty amazing that modern GPUs are now fast and flexible enough that you can render fairly interesting SDFs in real-time. Unfortunately, SDFs remain expensive enough to render that it’s still tricky to render complex scenes. I think it would be great to live in a world where SDFs and real-time ray tracing could replace polygons, but it seems we aren’t quite there yet.

I spent some time thinking about SDFs, and it got me thinking: there’s got to be a way to make these faster. If you think about it, rendering an SDF is costly because there’s quite a bit of computations going on at every pixel, many evaluations of a potentially complex SDF. This computational work, however, is highly redundant. Do you really need to evaluate the entire SDF for every single pixel? Ray marching is very much a brute force technique. There just has to be a more efficient way to go about this.

SDFs are mathematical functions, pieces of code that get repeatedly evaluated. I come from a compiler background, and so I started wondering if, potentially, compiler optimizations could be applied to these SDFs. What if, for instance, we could apply partial evaluation to optimize the said distance functions and make them run faster? I did some research, and it turns out, unsurprisingly, that I wasn’t the first one to think of such a concept. There has already been work in applying partial evaluation to ray tracing and applying partial evaluation to the optimization of shaders.

The work that’s been done on applying partial evaluation to ray tracing, in my opinion, doesn’t go far enough. The authors have essentially partially evaluated the ray tracing algorithm and specialized it in function of the objects present in a given 3D scene. What I think would be very useful, is to specialize the evaluation of SDFs in function of the camera position. What if, for instance, you could mathematically prove that certain objects in your scene are not present in the left half of the view frustum. That is, what if you could prove that most of the objects in your scene are not visible on the left half of the screen?

It seems like it would be relatively straightforward to recursively subdivide the image and view frustum into multiple quadrants and determine which objects will definitely not be visible in each. I know it can be done, in fact, directly by using the definition of SDFs, without very fancy tricks. If you could do this, then you could generate a separate, optimized SDF for smaller fractions of the image to be rendered. What I’m essentially proposing is to generate optimized pieces of shader code on the fly for smaller areas of the screen, which can be evaluated much faster than the SDF for an entire scene.

I don’t know how viable it is to compile and run fresh shader code every time the camera moves on a GPU, but I believe it might actually be possible, using this kind of optimization, to render SDFs on a multicore desktop CPU at decent frame rates.


Sense from Chaos – Crossing the Semantic Gap

Edsger Dijkstra once said:

“The question of whether machines can think is about as relevant as the question of whether submarines can swim.”

This was the view most AI researchers held back in the 1960s and 1970s. Back then, many thought that general-purpose AI could be achieved purely through symbolic manipulation. That is, it was thought that we could build machines that, through purely logical reasoning, would derive a sufficient understanding of the world to reach and even exceed human intelligence. This kind of vision of AI is illustrated in classic sci-fi novels, and embodied in the infamous HAL 9000 computer of 2001: A Space Odyssey.

The quest to build symbolic AI led to much research, and impressive early successes. These successes lead to great optimism, and the belief that computers would be able to effectively handle human language, machine translation and vehicle driving before 1980. Symbolic reasoning is very adequate for reasoning about mathematics, or small closed systems with very well-defined properties, such as the game of chess. Unfortunately, multiple dead ends were quickly reached. It was found, as philosophers had predicted, that you could hardly build a machine capable of reasoning about the real world through purely symbolic means. The problem was one of grounding.

Symbols in a vacuum don’t mean anything. You can create an ontology in which you define the concept of a chair, and you can put logical statements in this ontology as to how people use chairs for sitting on, and that chairs are movable objects that behave according to Newtonian laws of physics. However, if you’re trying to build a robot that can see and manipulate chairs, your ontology is basically worthless. You have two HD cameras providing you with 20 million pixels of raw data at 60 frames a second, and no logical statements are sufficient to help you tell where a chair might be in all that noise. Your machine has the concept of a chair, but this concept exists only in some Platonic realm that is completely divorced from the real-world.

Ultimately, the grandiose predictions of early AI researchers proved much too ambitious, because the real world was much too difficult to cope with, or at least, much more difficult to cope with than mathematicians had hoped. This lead to the first AI winter, with funding rapidly drying up for what is now known as GOFAI (Good Old Fashioned AI) research. For at least two decades after this, AI became kind of a dirty word in computer science circles. Those who still pursued AI-related research renamed their field machine learning, so as to avoid any association with the overhyped goal that we might one day build machines with human-level intelligence. From then on, machine learning researchers left behind the lofty goals of AI, and instead focused on the basics: small narrowly-defined learning tasks where they knew they could make headway.

Neural networks are not new, but it’s only in the last few years that they have started to really shine, with deep learning. Advances in algorithms, access to large datasets and the unprecedented availability of computational resources have made it possible to scale this machine learning technique to networks with many layers of depth. Recently, some impressive feats have been achieved with deep neural networks, including object classification that exceeds human performance, and of course the much-discussed victory of the computer program AlphaGo over human Go champion Lee Sedol. Some amazing projects such as the neural artistic style transfer and and deep networks that learn to synthesize new images have also surfaced.

At the moment, universities are seeing a huge increase in student interest in deep neural networks, with classes sometimes tripling in size over previous years. Governments and industry alike are pouring billions into deep learning research. It’s undeniable: we’re in the middle of a machine learning boom. The optimism is so great that well-known researchers such as Yann Lecun and Yoshua Bengio are even daring to use the word AI again.There’s a lot of excitement (and some fear) about what deep neural networks will bring. People are becoming scared about robots taking human jobs, and the question of when computers will reach human-level intelligence is being asked.

To me, it really is a question of when. Provided that humanity doesn’t sink into a post-apocalyptic middle-age, there is no doubt in my mind that machines can and will reach human intelligence. I just don’t think that we’ll get to Artificial General Intelligence (AGI) in the way that most people think we will. Many seem to believe that we just haven’t come up with the right algorithm for human-equivalent intelligence, or that we just don’t have enough computational power. Clearly, to reach human-level intelligence, what we need is a deep neural network with a number of connections equivalent to that present in the human brain, right? I don’t think so. I don’t think that’s what we’re missing.

In my view, the human brain is not running some one algorithm. The human brain is not just a huge wad of neurons, a huge uniform neural network. The human brain is made of many different specialized components that do different things, connected together by a variety of pathways. Deep neural networks are awesome, they’re an amazing achievement, but they’re just one piece of the puzzle. What deep neural networks do, is solve the perceptual problem. Deep learning allows us to do what the human visual cortex does. You get a million pixels of color information as input, and you turn this raw mass of data into a few classes of output. In short, with deep neural networks, we can turn real-world data into symbols.

There is no doubt in my mind that an AGI has to contain some sort of connectionist component, such as a neural network within it. To make useful AI programs however, the logical thing to do seems to be to assemble multiple specialized pieces together. In my view, AlphaGo is a beautiful illustration of this. It melds together multiple deep neural networks, which are used to do things such as assigning a value to different Go board configurations, along with a Markov tree search algorithm for looking at possible future moves. AlphaGo is very much a marriage of GOFAI techniques together with the power of deep neural networks. Deep networks make sense of complex board configurations without the need for hard-written logical rules as to how individual Go stones should be counted. Deep networks do the more intuitive perceptual work, and good old fashioned tree search does the logical reasoning based on this data.

Deep neural networks bridge the semantic gap between classical computer systems, where symbolic entities are defined in absolute terms in databases, and the fuzziness of the real world, where exceptions are the norm, and nothing can be entirely captured by absolute rules. If I had to guess, I would speculate that robots of the future, as they get more intelligence, are going to feature increasingly complex AI architectures made of multiple specialized components. There will be deep networks that do vision, hearing, perceive human faces and facial expressions. There will also be modules that do planning, navigation, logical/ontological reasoning and reasoning by analogy. All of this AI technology that we’ve been developing for the last 60 years is going to need to come together into a whole. That is one of the next big challenges. Making AGI happen won’t just be scientific work, it will also pose a plethora of engineering challenges.

I think it’s possible to build useful robots and human-equivalent AI without understanding the human brain in depth. However, I think that when we do build such AI, we’re necessarily going to converge towards an architecture that does many things in ways that are actually quite similar to what happens, in computational terms, inside the human brain. Submarines may not swim, but in order to move forward, they still have to displace water. It’s said that humans can only keep about 7 items of data in their working memory. It wouldn’t surprise me all that much if, one day, neuroscientists manage to map out the human frontal cortex, and they discover that in this region of the brain, there are neurons laid out in a system that implements what can essentially be thought of as a small set of general-purpose registers, each able to store a neural vector representing a semantic concept. In other words, our ability to reason by analogy and to manipulate abstract concepts in our mind is probably embodied by neural machinery that serves to perform symbolic manipulations.

Moving to the Valley

I haven’t talked much about my research work on this blog recently. That’s because I’ve been busy writing my thesis and wrapping things up. The good news is that I’ve had my thesis defense two weeks ago, and I’ve completed the requirements to get my PhD. It feels a little surreal to have finally reached this goal, after six and a half years. The next step, for me, is a transition to working in the industry. I considered doing a postdoc after my PhD, but it’s become fairly clear to me that although I do enjoy research, I can’t stand the way academia operates, and the focus on minimum publishable units.

At the moment, I’m in the process of selling, giving away and packing up my things in Montreal. I signed a job offer with Apple. I chose to work with them in large part because their recruitment process was very smooth and human, for lack of a better word. The people I interviewed with seemed relaxed, and like they genuinely enjoyed their work. I know from experience that this is not the case everywhere. I’ve had interviews at tech companies which run giant cubicle farms with people running everywhere, interviewers who look nervous and overcaffeinated. Apple is quite secretive, and I don’t think I’m allowed to say much about my job description, but I will be working on compiler technology, and developing my skills in this area further, which is what I wanted.

I will be moving to the bay area in early March. It pains me to leave behind Montreal, the city where I was born and raised, as well as all of my friends here, but It’s also exciting to be moving to Silicon Valley, the Holy Mecca of the technological world, and to get a chance to work on cutting edge technology. It will be nice to be able to meet more people who share similar interests on a regular basis. As someone who gets a bit emotionally down during the Canadian winter, it’s also uplifting to contemplate living in California, where there is plenty of sunlight and fresh produce all year long.

One of the decisions I have to make is where I’m going to live. Apple is based in Cupertino, which is about 45 miles from San Francisco. They have a shuttle which goes to the city, but that means two or more hours of commuting every weekday. I’ve lived in the city all my life, and I know that San Francisco is way more hopping than the bay area suburbs, but I find myself thinking that I should probably make my home closer to the Apple offices. If I live in the suburbs, I could probably bike to work every morning, a net positive for my health (whereas two hours of sitting in a bus would be a net negative). The suburbs also have slightly less insane rents than San Francisco, which means I could afford to have my own sweet 700 square feet of living space.

For those wondering, I do plan to keep on running this blog. I don’t know how much free time I can realistically expect to have, and how much energy I’ll be able to dedicate to coding after work hours, but I will try to keep advancing on my personal projects, and keep using this blog to discuss various topics I find interesting.

Building a Small-Scale Prototype

I’ve blogged before about my plans to create a programming language of my own. I have a long list of features I’d like to experiment with. I want to incorporate features that are not commonly found in other languages. For instance, the language is going to have an extensible grammar. I’d like to have program images that can be suspended and resumed like in Smalltalk. I’m also trying to sketch out a JIT compiler that compiles itself. Unfortunately, so far, I’ve spent more time thinking about the design and implementation of this language than writing any code. I wrote a decent amount of code, then ended up scrapping it and starting over, only to think that I should maybe start over again.

Part of the problem, I think, is that I keep wanting to plan 20 steps ahead. My language, zeta, is going to have a few features which are fairly uncommon/experimental, such as the extensible grammar. I don’t know ahead of time precisely how these features are going to work out as part of the overall design, there are many small micro-decisions to make, things that could be implemented this way or that way, and that’s where I keep getting stuck. I keep worrying about issues such as performance, garbage collection, multithreading, and how different language features will integrate with one another. Inevitably, I end up having to admit that there are simply a lot of important questions I don’t yet have an answer to. Language design is hard, and so is compiler design.

I think that this is a common problem with ambitious engineering projects. When you’re trying to innovate (or just do things differently) on ten different fronts at the same time, there are simply too many unknowns, too many degrees of freedom. This is why innovation is typically more incremental than revolutionary. People build on trusted foundations, and maybe they add one or two cool new features on top. When you try to reinvent everything, there are just too many problems to be solved at once. Solving multiple problems simultaneously is problematic, because it gets in the way of building your product. You want to create something that’s globally optimal, as good as you can make it, but you keep having to make technical decisions which you simply don’t have enough information to make. Each of these decision points stalls you a little bit, because making the optimal technical decision requires taking the time to pause and gather more data. Worse yet, not all of these technical decisions are independent, some are simply incompatible with one another. Choices you make now will push you down a certain path later down the road. Sometimes, you make a certain set of choices, and later on you realize that you went down the wrong path, and you have to backtrack.

I keep wanting to do the ambitious thing, and getting into complex, elaborate designs as a result. I start building, and I often end up backtracking. I’ve decided to take a step back and ask myself some important questions, such as, why “Why do I even want to create a programming language?” and “What am I getting out of this?” One of the things I’ve come to realize is that creating a programming language as a hobby project for the simple joy of doing it, and creating a programming language which I hope will eventually become popular and dethrone python are two very different projects with very different requirements. Creating a programming language for fun is a project with very loose requirements and much freedom to explore, motivated by the joy of programming I discovered when I was a teenager (and hope to keep alive for a long time). On the other hand, creating a language in the hope of whacking Python in the gonads with a 2×4 is a project likely motivated by my inflated ego, and my failure to integrate mindfulness meditation into my daily routine.

Designing and implementing a popular programming language means I have to care about many issues such as:

  • How learnable/palatable the syntax is to the masses
  • How beginner-friendly the language is
  • Integration with C/C++ code
  • Deployment, how easy it is to install my compiler/VM
  • Being able to generate binary executables
  • Multi-platform support, including mobile devices
  • Unicode support and internationalization
  • The absolute performance of my language
  • Multithreading and parallel performance
  • Security of my implementation (exploits)
  • Responding to the wants and needs of the community

All of these issues, and more, are things that necessarily strongly influence the design of a language. They are things that limit my ability to freely explore language design. Most importantly, creating a language with popularity in mind means that this project starts to be very much about caring about the wants and needs of the community. Lots of limitations on my ability to pick the language design I want, and having to care (work for) a huge community means that the project starts to look a lot like a job rather than a fun hobby. Hence, if I want to explore language design, maybe I ought to go about this project differently, scale down my ambitions, and go for something less grandiose.

I’ve decided that I’m probably not ready to build a commercial/popular language. Instead, I’m building a small-scale prototype. I’ve asked myself “What’s the most important language feature I want to experiment with?” The answer is the extensible grammar. What I’m doing, then, is building a sort of Minimum Viable Product (MVP). A tiny minimalist language with just the features I need, so that I can get to the interesting part of experimenting with an extensible grammar as soon as possible. Right now, this language is interpreted only, and it has no garbage collector (it never deallocates). Even this, I’m finding out, is challenging. Even this tiny project involves solving many technical problems, but things are already much simpler, and most importantly, much more fun. Once I have the extensible grammar working well, I’ll move on to adding other features, and exploring other equally interesting technical problems.

Exercise, Depression and the Energy Hypothesis

There’s an increasing body of evidence claiming that exercise can help alleviate symptoms of depression. Some studies are claiming that exercise is just as effective as prescription antidepressants, and possibly even more so. The question I ask myself is: why is exercise helpful in cases of depression? It’s universally accepted that exercise is good for you in many respects. So much so, that we rarely stop to ask ourselves why that might be.

I’ll begin by saying that depression is a complex phenomenon. I don’t personally believe that depression is some one disease or disorder with some fixed set of symptoms or some single cause that we can point to. Rather, I believe it’s an umbrella term that we generally use to describe the experience of people who experience persistently low moods and a lack of energy or motivation for what seems like an abnormally long amount of time. In this post, I’m going to focus on one component or symptom of depression which is known as motivational anhedonia, this is the loss of desire or motivation to engage in activities.

It seems that one of the areas where exercise is most helpful is in helping people find more energy and motivation. I’m going to propose a simple hypothesis as to why that might be, which is that your brain has evolved to save and accumulate energy. It wants you to stay alive and hopefully spread your genes, but ideally, it wants you to do this while spending the least amount of energy possible. Your brain would rather have you accumulating energy than spending it. Body fat is an insurance policy: saved up energy to safeguard you against the bitter famine that might be just around the corner. The reason why many people find it so difficult to lose weight is likely that our brains don’t really want to let us do so.

When the winter comes around, and the days get shorter, many people experience Seasonal Affective Disorder (SAD), which can cause lower moods and difficulty getting out of bed in the morning. Here’s the thing though: it’s probably natural for you to feel tired, to move less and to spend more time sleeping during the winter. Most other mammals probably feel just the same way. This is simply your instinct trying to keep you from running around and jumping everywhere, trying to make you save your energy, because in the winter, there’s much less food for you to find. Sleeping more and moving less might not feel very good, but it’s probably the safe thing to do when resources are scarce.

How does exercise fit into the picture? I propose that your motivation to move around and do things reflects your brain’s willingness to let you spend energy. I propose that probably, somewhere in your brain, there’s some mechanism which measures how much energy you need to spend to stay alive, a kind of daily energy budget. This energy budget is likely based on how much energy you’ve needed to stay alive in the past. Exercising involves forcing yourself to spend more energy than usual. When you do this, your brain estimates that your life situation requires you to spend more energy to stay alive, and increases your energy budget.

This hypothesis might help explain why, despite the abundance of food and entertainment found in western societies, depression is on the rise. By many accounts, living conditions in North America are more comfortable than they’ve ever been. Maybe that’s part of the problem though. Maybe the comfort of modern civilization: cars which can get us around without walking, entertainment we can enjoy while sitting on the couch and microwaveable food we don’t even need to cook, have made our lives too easy. These things have made it possible for us to survive without spending energy or moving very much. Maybe, for many people, the way to find more energy is to spend more energy.

Interestingly, there’s also been some recent research suggesting that intermittent fasting, and ghrelin (the hormone which produces the feeling of hunger) might stimulate neurogenesis. As such, it’s possible that intermittent fasting might help combat depression. This also fits with the energy hypothesis, in that when there is a calorie deficit, and energy sufficiency isn’t guaranteed, the brain becomes motivated to have us spend energy so that we might be able to find more food.

I’ll conclude by saying that I do not, in any way, mean to trivialize depression, or the suffering that’s involved. I’ve been depressed, and I know very well that the last thing you feel like doing, when it seems like your world is crumbling down, is to hop on a threadmill and run. I also know that the last thing you should tell a depressed person is to “just snap out of it”. If you have a depressed friend, and you want to help them, then my advice to you is to offer them support, kindness, and a lot of patience. Lastly, exercise and diet can only go so far. Depression is not one disease, and exercise is not some universal cure. Some people cannot get out of depressed states without the help of professionals and prescription antidepressants. These people need help and support just as much.

Musings on AOT, JIT and Language Design

Lately, I’ve been pondering the strengths and weaknesses of static and dynamic languages. In particular, the reaction of the online programming community to Elben Shira’s post The End of Dynamic Languages made me realize that many (most?) programmers out there seem to prefer static languages. This seems to be largely because static languages usually come with better IDE and tool support, as well as static guarantees which programmers value highly (and rightly so).

I’m slowly working on a design for a programming language of my own, and I had a “wait a minute, let’s take a step back” kind of moment. Why? Because the language I was working on was going to be extremely dynamic. It was going to be JIT compiled, dynamically typed, with an extremely powerful macro system based on the ability to hijack and modify the parser while a program is being parsed. I was also considering using a self-modifying Intermediate Representation (IR) as part of the design of the JIT compiler.

Elben’s post made me take a step back because it made me realize that the design of my ultra-dynamic language would make it near-impossible to have much tool support. If you can hijack the parsing of a program while it’s being parsed, and alter the parser using custom code you wrote, then good luck running any kind of static analysis. It’s just not going to work.

I saw that I was a being myopic. I use a bare-bones text editor, no tools, and I’ve only used a debugger a handful of times in the last few years. That’s the way I program, but it’s not the way most people program. I can get away with this because the projects I’ve worked on have all been under 100K lines of code. Designing a language which makes any kind of static analysis, refactoring-assistance or autocompletion near-impossible is a sure way to guarantee the language won’t gain adoption outside of hobbyist circles.

Then, there’s the issue of JIT compilation. I very much like JIT compilers because that’s where my expertise lies, I’ve written my own JIT as part of my PhD thesis. Still, I have to keep an open mind and ask myself what’s best for my language. JIT compilers are fundamentally more powerful in terms of optimization capabilities than AOT (Ahead-Of-Time or static) compilers, because they can adapt the way they optimize code based on the behavior of running programs. They’re great for dynamic languages.

Unfortunately, it seems that the world, at the moment at least, is stepping away from JIT compilation. In order to support mobile platforms or compile to WebAssembly, it seems to me that JIT compilation is impractical. Why? Because it requires generating machine code on the fly. At the moment, it’s unrealistic to just plop a custom binary with self-modifying machine code on the Apple or Android app stores.

I’m thinking I might just design my language so that it compiles to plain old C. This would make it possible to leverage the C compilers that already exist for every platform out there. Rest assured though, this language will be dynamically typed ;)

Have Static Languages Won?

A few days ago, Elben Shira caught the attention of the programming blogosphere with his post entitled The End of Dynamic Languages. The key point from this post is in the following statement:

This is my bet: the age of dynamic languages is over. There will be no new successful ones.

Like him, I’ve noticed that despite the fact that there have been an enormous number of new programming languages coming out recently, the overwhelming majority of them are statically typed. Elben and others make the argument that this is because static languages are better equipped to deal with larger projects, they have better tooling, and programmers prefer them.

I’m fairly invested in dynamic languages. I happen to be completing a PhD thesis on techniques for optimizing dynamic languages for performance (JavaScript in particular). I’ve also been slowly working, in my spare time, on a dynamic language of my own. I like dynamic languages, but I think it’s important, as a scientist and as a human being, not to stupidly fall into the trap of confirmation bias. I’m not going to vehemently defend dynamic languages, or claim that those who don’t appreciate them are unknowing fools. Instead, I’ll be honest and say that Elben’s post, and the significant amount of agreement he found in online communities made me pause and question myself. Are static languages inherently superior? Are they really winning?

That there are less dynamic programming languages coming out is an undeniable fact. I’ve written code in statically typed languages such as C, C++, D and OCaml, and I agree that their type systems help catch certain classes of bugs more easily and rapidly. When writing code in JavaScript, you can run into nasty surprises. Latent, trivial bugs that remain hidden in your code, sometimes for months, until some specific input causes them to manifest themselves.

The problem here though, is that JavaScript is a badly designed programming language. The JS origin story is that Brendan Eich originally designed the language in just two weeks. As such, it has many glaring flaws. If you compare Haskell, a language that came out of type theoretic circles and was carefully crafted by a large research community and extensively studied, to JS, a language designed in two weeks, you find that one looks like a NASA spacecraft and the other looks like a pile of fireworks on the 4th of July in Alabama.

Dynamic languages are at a disadvantage. Most of the mainstream ones out there today were designed by amateurs, people with no formal CS background, or no adequate background in compiler construction. They were designed with no regard for performance, and an impractical mash of features that often poorly work together. Most of the dynamic languages you know are simply poorly crafted. This has resulted in some backlash. I think it’s pretty clear that there’s some amount of prejudice when it comes to dynamic languages.

In universities, computer science professors generally want little to do with dynamic languages. Compiler design courses are focused on statically typed and compiled languages. Type theoretic courses will teach you about Hindley-Milner type inference, but will leave you ill-equipped to understand dynamic typing. Students coming out of your average university compiler and PLT classes have no idea about the challenges involved in creating a JIT compiler, and know little about dynamic typing. It’s no big surprise that these students would go on to create statically typed programming languages.

There might be another factor at play here. Dynamic languages such as PHP, JS, Python and Ruby, in addition to being relatively poorly designed, are the languages that powered the explosion of the web. Nowadays, much (most?) of the programming happening in the world is web development. Much of this work is done by people with no formal CS background. The result is that you have millions of people with less formal education writing code in less well designed languages. JS, Python, PHP and Ruby, and by extension all dynamic languages, are seen by many academics as the programming languages of the riffraff, the unwashed masses, or maybe simply programming languages for relatively ignorant beginners.

Have static languages won? It seems to me that what people really like about static languages is IDE support for things like simple refactorings and autocompletion. Program analysis that can provide some guarantees, find certain classes of bugs without having to run programs with every possible combination of inputs. It’s perfectly legitimate for programmers to want these things. They help alleviate the cognitive burden of working with large (and small) codebases. But, these advantages aren’t inherently advantages of statically typed programming languages. I would argue that Smalltalk had (has) some amazingly powerful tools that go way beyond what the Eclipse IDE could ever give you.

I believe dynamic languages are here to stay. They can be very nimble, in ways that statically typed languages might never be able to match. We’re at a point in time where static typing dominates mainstream thought in the programming world, but that doesn’t mean dynamic languages are dead. So long as dynamic languages do offer advantages, either in terms of expressiveness or ease of learning, they will still be around. You should remember that, in the end, there is no war between static and dynamic languages. There are only tools and tradeoffs.

I will conclude by saying that in my view, programming languages are constantly evolving and influencing each other in what seems like an organic process. Features that are viewed as good ideas tend to spread from one language to many others. Think about closures, for instance. The functional crowd has been in love with closures since the days of LISP, and now, almost all mainstream programming language have closures. Going back to Elben Shira’s post, he states:

We will see a flourishing of languages that feel like you’re writing in a Clojure, but typed. Included will be a suite of powerful tools that we’ve never seen before, tools so convincing that only ascetics will ignore.

I’ve written, back in 2012, about my belief that static and dynamic typing can essentially be combined. You can have statically compiled languages that use type inference to realize what is effectively dynamic typing. That is, the compiler inserts union types where appropriate, and does so automatically. The Crystal programming language is a realization of this idea. This isn’t static languages winning a war over dynamic languages though. It’s the influence of dynamic languages bleeding into static languages. The Crystal language developers are entirely honest about the fact that this language is based on Ruby. Their aim is to build a language which captures much of the flexibility of Ruby’s dynamic typing and also provides you with static guarantees.

Why I Chose Industry Over a Postdoc

About a year ago, I was considering postdoc options. Two university professors had invited me to join their research groups. I wasn’t quite sure what to do. I like research, and I like the work I’ve been doing as part of my PhD. On the other hand, I’ve been growing increasingly frustrated with academia, and more specifically, with the publication game. I’ve had papers rejected several times now. More than once, reviewers who were clearly associated with competing research projects (and did not try to disguise this fact) have shot down my work with unfair, intellectually dishonest and sometimes hostile criticism. In general, I’ve come to feel that, at least in my sub-field, the exploration of new ideas is discouraged, and I’m not getting judged on the quality of my work, I’m getting judged on how well I play the publication game.

The end of my PhD is just a few months away now, and I had to make a choice. I was tempted to continue my research on basic block versioning, but the prospect of working very hard and maybe not being able to publish a single paper made me uncomfortable. Another issue is that the two professors who approached me for postdocs have made it clear that I should apply for a postdoc scholarship. They didn’t have enough money to pay me, and if I didn’t get this scholarship, I couldn’t do a postdoc. They were also putting pressure on me to decide as fast as possible, I got to understand that postdoc positions are limited and it’s a very competitive environment.

Recently, I attended a conference and got to meet a researcher who’s pretty well known in my field. He’s someone I really look up to, someone who’s name I’d seen on several papers that have shaped the development of my own research. We had dinner together a few times during the conference, and discussed various topics. One of the things that really struck me though, is that this guy is in the process of hopping from postdoc to postdoc. He’s struggling to publish his research, getting many of his papers shot down, and having difficulty finding a position as a university professor. He’s a much better academic than I am, and he’s still struggling.

In the meantime, I never really looked for a job, but I’ve been approached by IBM, Facebook, Microsoft, Google, Apple, Twitter, Amazon, Autodesk, AppNexus, Two Sigma, Reservoir Labs, D-Wave, and a few startups. It’s been a stark contrast. On the one hand, academia is offering me a chance to maybe do a postdoc, but only if I’m deemed good enough by the people who judge scholarship applications, and I have to decide now. On the other hand, industry people are bending over backwards to try and get me to come talk to them. I decided to go out and try interviewing for some of these companies, and over a month ago, I made my decision. I signed a generous offer from a company in the bay area.

I have no illusions that industry is some amazing utopia. I’m sure it will take me some time to adapt, and that I’ll miss some of the perks of being an academic. I know I’ll also miss Montreal, the city where I was born and raised. Still, I’ve been in university for over 11 years now, and I really think it’s time for me to try something different. I think that if I continued on the academic path, I’d be headed for stagnation and a burnout. Industry, in contrast, seems full of opportunities to explore. And hey, it won’t hurt that I’ll be making over six times what I get as a PhD student. For the last two years, I’ve been renting a tiny bedroom with a window on a noisy street, and sleeping on an uncomfortable futon that’s hurting my back. One of the first things I’m buying when I make it to California is a queen-sized bed, and the best mattress that money can buy.