Skip to content

ZetaVM, my new compiler project

Like many of you I’m sure, I’ve wanted to create my own programming language for a long time. I think it’s a common aspiration for many programmers, to create a language with all the features we love and none of the ones we hate. Our own “ultimate programming language”, created in our own image. What I’ve come to realize over time, however, is that I’m actually not quite sure what should go into my ultimate programming language. I have a lot of ideas, but they don’t necessarily fit into a coherent whole. It’s also the case that my ideas about what the “ultimate programming language” should be like keep changing as I gain more programming experience and get exposed to new ideas.

My PhD was in compiler design, and this is something I truly enjoy playing with. As such, I’ve decided that my next programming language project wouldn’t be a programming language per-se, but actually a compiler, a platform to create new programming languages. I’m doing this in part because I enjoy it, and it’s something I feel confident I’m good at, but also because I think I can build a platform that will make it much easier for myself and others to do programming language design and experimentation. ZetaVM is going to have, as one of its main design goals, to make creating new programming languages very accessible. It will make it possible for anyone who’s mastered a language such as Python to create a language of their own, in less than 2000 lines of code. Not only that, but ZetaVM will instantly make your language decently fast. It will have JIT optimizations suitable to languages such as Python/Lua/JavaScript, and will instantly give you fast objects, type-specialization, etc.

ZetaVM is a virtual machine for dynamic programming languages. It will provide native support dynamic typing and most common data types found in Python/Lua/JS/Ruby, such as strings, extensible objects, extensible arrays. What makes it particularly easy to get your own language running on this VM is that Zeta’s Intermediate Representation (IR) is representable as a textual format similar to JSON. This makes it fairly trivial for you to write, say, a Python parser for your new language, and generate Zeta IR in a textual format at the output. You don’t have to worry about implementing dynamic typing, or register allocation, or garbage collection, or arrays and objects, all of that is done for you. I’ve created a simple language called Plush (JS and Lua’s bastard child), which demonstrates how this can be done, and serves to help me bootstrap and test the system.

Beyond making it easy for myself and others to create programming languages, Zeta will be a platform for me to try some bold new ideas in the design space of programming languages and JIT compilers. I would like to try and tackle one of the biggest issues plaguing programming languages today, which is that of code rot. My goal is to eventually freeze the IR and APIs provided by Zeta, so that code that runs on Zeta today might have a chance of still working in 20 years, without any changes. This goal is ambitious, but I have some ideas which I believe might make it work.

Finally, one big disclaimer I should give is that Zeta is still a young and immature project. In its current state, Zeta is experimental and will have many breaking changes, as most new languages/platforms do. Zeta also currently only has a naive interpreter which walks the object-based IR and is dog slow, about 200K instructions per second. I’m currently working on an interpreter that will compile the object-based IR into a lower-level internal IR. This interpreter will use Basic Block Versioning (BBV) and self-modifying code. I believe it should realistically able to reach speeds of 100MIPS within the coming months. My plan after that is to build a lightweight JIT which will sit on top of the optimizing interpreter and compile the internal IR to machine code.

Technodiversity

The year is 2048. Migration to IP-V6 has just been completed. Every object in the world, including your coffee mug and the chair you’re sitting on, comprises a manycore RISC-V cluster, running Linux, with its own unique IP address. Haskell, because of its mathematically provable superiority, has come to supplant every other programming language. Writing code in any language other than Haskell is now a misdemeanor, punishable by up to 64 days of bandwidth throttling and a fine of up to 0.125BTC.

Wouldn’t things be much simpler if every computer system was built on the platform, ran the same operating system, and every program was written in the same programming language? I think that in many ways, this is the wet dream of many programmers. No more dealing with cross-language boundaries, portability issues, and multiple incompatible implementations of subpar standards. Things could be much simpler and more efficient than they are now.

The biggest problem, however, is that in a world where every computer system runs the same software, the same version of the same operating system, on the same hardware, every computer system has exactly the same bugs and security weaknesses. Given that some security flaw exists, a single computer virus could potentially contaminate every computer system in the world in a very short amount of time. In today’s world, this Hollywood-like doomsday scenario seems extremely implausible. The real world is too messy, or too diverse, for it to be practical to build a single virus or worm that could infect every system.

In a lot of ways, the chaos of the technological world resembles that of nature. Like animal species competing for survival, various technologies and standards compete for funding and mindshare. In nature, species specialize to exploit new environments. Diversity, in the technological world, exists in part because specialization makes systems more efficient, which allows the exploitation of new market niches. In nature, genetic diversity, or biodiversity, makes it near-impossible for a single virus to wipe out the entirety of life on earth.

Today’s technological world is definitely messy, but we can take comfort in the fact that competition really does foster innovation. The technological diversity that exists right now is a sign that we live in a thriving ecosystem, as opposed to one that is stagnant. With self-driving cars and a variety of home automation devices on the horizon, we can also take comfort in the idea that technodiversity may actually be keeping us safe.

 

The Brain’s Registers

In the last few years, there’s been a lot of impressive work done with neural networks. One of the most interesting things I’ve seen is Word2vec, a technique to compute word embeddings, to map english words to large vectors of neural activations. This is interesting because this latent space for words has interesting properties. Concepts that are semantically close end up close together in the vector space, and it’s possible to do arithmetic on word vectors, giving us results such as ‘queen – woman + man ≈ king’.

The ability to do approximate arithmetic on word vectors is nifty, but what I think is most interesting is that Word2vec shows that we can translate fuzzy human concepts such as english words or sentences into an encoding from which computers can extract semantic information. There are now machine translation models based on recurrent neural networks which take a sentence in a language X, generate an activation vector, and then feed that vector into another recurrent neural network trained to generate text in another language Y. These machine translation models have reached the point where they are competitive with traditional statistical models.

The ability to create an intermediate representation for machine translation is quite powerful, but vectors of neural activations can be used to encode semantic information contained in many things besides written language, and also to transfer information from one modality to another. Researchers at Google have also shown that you can generate captions from images by first encoding images into semantic vectors, and then feeding those vectors into a recurrent neural network which generates text. Hence, a vector of neural activations can be used to represent semantic information, meaning, maybe even concepts and ideas in an abstract space that is agnostic to input and output modalities.

I hope this doesn’t seem too far-fetched, but I’m going to suggest that it’s highly likely that such semantic vectors are used to convey meaning and transfer information between different representations inside the human brain, particularly in the areas of the prefrontal cortex (PFC) that implement higher-level thinking. I’m going to go even further, and make the prediction that the PFC implements something analogous to the registers found in a microprocessor. Semantic registers which are used to buffer, store and transfer vectors of neural activations that encode meaning, concepts or ideas.

There is obviously still a lot we don’t know about the human brain, or the way in which it implements higher-level thinking, and the neurons in artificial neural networks are very loose approximations of their biological cousins. Still, bees, bats and birds use wings to fly. These are different kinds of wings, with different properties, which have evolved independently, but they are all wings nonetheless. All three have come to evolve a similar solution to the problem of flight because this solution is natural and efficient. In the same vein, I think you could make the argument that a register is a natural concept when it comes to shuffling and operating on data.

The registers in a microprocessor hold small pieces of temporary data. They store vectors of bits and typically have a few input and output ports. They implement simple operations, such as the ability to reset/erase the contents stored, and the ability to open or close the flow of information to/from various components inside the processor. In digital hardware, opening and closing of data ports is accomplished by gating signals with transistors. In biology, the gating of signals is accomplished by inhibitory neurons.

According to Wikipedia, working memory is “a cognitive system with a limited capacity that is responsible for the transient holding, processing, and manipulation of information”. It’s believed that human beings are limited to being able to hold only about seven items at a time in working memory. Could it be that this temporary store of information is the human brain’s equivalent of a CPU’s register file, built out of a relatively small set of neural registers, with inhibitory neurons gating their outputs? I believe this could begin to explain our amazing (and yet limited) ability to shuffle and operate on concepts and ideas stored in our working memory.

In a typical CPU, the register file is connected to components such as an Arithmetic and Logic Unit (ALU). The ALU can read data from one or more registers, and perform operations such as addition, subtraction and multiplication between the values read, and store the result back into registers. Values can also be compared and the result of such comparisons used to decide which action the CPU should take next. Human beings are not born with the ability to compute signed integer multiplications, but there are certain high-level thinking faculties which may be innate, such as the ability to reason by analogy.

One could imagine that, somewhere in the PFC, there may be a dedicated component which is able to load semantic registers and draw analogies between semantic vectors, to answer the question of whether A is to B as C is to D, and perform some action based on the result. It could be that the part of our brain that does high-level thinking contains a number of operators that connect to one or more semantic registers and perform various operations on the concepts represented. These would be our brain’s ALU: the set of primitives that enable our high-level cognitive abilities.

As Edsger Dijkstra famously said, “The question of whether machines can think is about as relevant as the question of whether submarines can swim.” Maybe my attempt to make sense of higher-level human thinking by drawing analogies with computing hardware is misguided. Maybe working as a compiler engineer and looking at machine code for too long has warped my own cognition beyond repair. The ideas I’m proposing here are largely speculative, but maybe I’m right, and maybe, within a decade or two, someone will begin to map out the circuits that constitute the brain’s registers and operators.

 

 

 

 

Method Call Syntax in a JS-like Language

As a side-project, I’m working on a small JavaScript-like programming language called Plush. Like JavaScript, this language is going to have objects and method calls. In this blog post, I’m going to discuss some seemingly trivial aspects of method call syntax and semantics which illustrate why programming language design is difficult, and solicit your feedback as to potential solutions.

Like JavaScript, plush has object literals which can be defined with the curly brace syntax. It will also have function objects, which are in fact closures. In JavaScript, you can assign a function to an object property, and it becomes callable as a method, like so:

o = {}; // empty object
o.m = function (x) {…};
o.m(2); // call the function in property m as a method

This all seems fine and dandy, but there are some issues with the way JavaScript implements this. The first is that method call syntax introduces some “hidden” semantics. When you call a JS function as a method, there is a hidden this argument (a reference to the object the method is called on) which gets passed to functions. This is invisible to the programmer. Hence:

// The statement below:
o.m(2);

// Is not equivalent to these statements:
f = o.m;
f(2);

// But it is equivalent to these two statements:
f = o.m;
f.call(o, 2);

The passing of the hidden this argument is a little bit annoying because it breaks the programmer’s intuition in a small way. JS syntax makes method calls look like regular function calls, but there is a difference. Sure, it’s not a big deal, but I ran into this issue while implementing IO functions in Plush. I wanted to implement some C++ “host” functions which would allow the language to interface with the outside, as such:

io = import “core.io”; // import the IO module

io.print(“hello world!”); // here ‘print’ is a host function written in C++

Maybe you see where this is going. If I implement method calls as JavaScript does, then every function, including host functions, need to take a hidden this argument, even if they don’t need it. I can work around this by making a special case where host functions called as methods behave differently, but then I potentially have to add an extra dynamic check at every function call. This gets annoying. I can work around this issue by wrapping my C++ host functions into plush functions which handle the unneeded this argument.

The second issue I ran into involves the way the this argument is handled. JavaScript has this problem where if you declare a closure inside a method, you can’t directly access the parent function’s this argument, as described in this blog post. This annoying quirk of the language was “solved” by the introduction of Function.prototype.bind. The JS workaround seemed like an ugly kludge to me, and so I thought “why not just do it like Python”? Why not make the this argument explicit. Make programmers declare and name the this value, in the same way that Python forces you to declare the self argument in methods.

Upon first examination, making the programmer explicitly declare the this argument seems like a fairy good idea. However, it does have the annoying consequence that an argument that was previously hidden is now a positional argument. Consider the following scenario:

// An IO module is imported, this produces an object containing IO functions
io = import “core.io”;

// The print method is “extracted”, so that we can call it with a shorthand name
print = io.print;

// This works fine. It passes a hidden this argument value to the print method
io.print(2);

// ERROR: we are not passing a this argument value
print(2);

In the case of modules, it’s clear that the print function shouldn’t even need a this argument value. I pondered this problem, and had the idea that possibly, method calls could have a different syntax from property accesses. The “arrow syntax” would make explicit the passing of the this argument:

// Call of the method m of object o
// The “arrow syntax” is specifically for method calls, and passes a this argument
o->m(2);

// Equivalent function call, passing an explicit this argument
o.f(o, 2);

io = import “core.io”;

// Regular function call, print does not take a this argument, it is not a method
io.print(2);

// This works correctly, as one might expect
print = io.print;
print(2);

The solution I came up with is not perfect, it raises some potential problems. For one, with a special arrow syntax for method calls, it becomes possible to call object properties using both regular function calls, and arrow style method calls which pass a this argument. Experience tells me that if both styles are possible, people are going to use both, which could be messy. For example, what do you do if you have a set of methods which do not need a this argument? Do you declare one anyway? Would you end up with a mix of regular function calls and method calls on the same object?

So, I’ve been using gedit this whole time

This is my coming out. I’m 31. I’ve been programming since I was 16, but all this time, I’ve never learned how to use a “proper” text editor.

It’s probably because as a teenager, I grew up with Windows 98, and on that platform, back then, command-line tools were very much second-class citizens. In university, I was introduced to Linux and somehow, over the years, I became hooked. I spent a total of 11 years in university, and over that course, I probably wrote over 400K lines of code. Most of it on Linux, and most of it using a text editor called gedit.

It’s the GNOME text editor. Its noteworthy features are syntax highlighting, and the ability to have multiple tabs open at once. It’s a fairly shitty text editor. It will freeze up if you try to open a text file more than a few hundred kilobytes, or with lines that are too long for it to properly digest. If you were to think of emacs as a sushi meal prepared by an experienced chef, then you could think of gedit as a microwaved bowl of ramen with a half cup of sriracha sauce dumped on it. I had the ramen right here at home, no cash on hand, and well, I was hungry.

I think the problem is, in university, there was never any class that was really about teaching us how to use properly and efficiently use tools. I learned about programming languages, operating systems, compiler design, and all those wonderful things, but all this time, I was never given the opportunity to sit down and think about the tools I was using. The craftsmanship of programming, so to speak. I started using gedit because it was the default on the Linux distro my university ran at the time. I stuck with it because of habit.

During my undergrad, we were given assignments with tight deadlines. We worked our asses off. Some semesters, I remember having as much as 40 hours of homework per week. Needless to say, there was a lot of pressure. Pressure to get things done quickly Given tight assignment deadlines, I didn’t really feel like spending 10 or 20 hours familiarizing myself with vi or emacs, or dealing with people who would tell me to go RTFM (Read the Fucking Manual). I went with gedit because it was intuitive and comfortable, as shitty as it was.

At my current workplace, we run MacOS, and well, while there is a port of gedit (not even kidding), it’s a fairly miserable experience. The MacOS port of gedit is like one of those freak creatures in a horror movie that begs you to pkill -9 it. Not knowing where to run, I started using the GitHub atom editor. It’s alright, but like gedit, it has its annoying quirks.

I’m 31, and I can’t help but want to take a step back. I don’t know how to use vim, but I think I’d like to learn. I can appreciate that a lot of thought was put into its design. It’s not a gimmicky gadget, it’s a powerful tool: a programmer’s text editor. I’m trying to convince myself that the time investment is worth it. At the very least, I can appreciate that vim is much more cross-platform, and stable across time, than any other text editor I’ve ever put up with.

My Experience with the ESP8266 – Making an LED Strip I can Control from my Shell

IoT is one of those buzzwords that’s been thrown around so much that it’s become largely synonymous with disappointing marketing hype. Still, home automation, in principle at least, has a lot of potential. The ESP8266 chip came out about two years ago, and it’s been drawing a lot of interest from the “maker” community. This inexpensive chip, for those who don’t know, incorporates a wifi module, flash memory, and a small 32-bit Xtensa core in a tiny package. This has had many people excited, because it means all kinds of electronic projects can be connected to wifi networks for only a few dollars in hardware cost.

I’ve known about the ESP8266 for a while, but until now, it wasn’t so interesting. Early versions of the chip had only a handful of I/O pins. You also needed to install some clunky SDK provided by the vendor to program it, and the instructions were not straightforward. Think tons of dependencies and lots of boilerplate code. Thankfully, this isn’t the case anymore. It’s now possible to get a NodeMCU module on eBay for less than $3.50 shipped, or $7 on Amazon. This module has many I/O pins, its own 3.3V voltage regulator, its own USB interface for programming, and best of all, is programmable with the Arduino IDE.

I’ve recently completed two projects with the ESP8266. One of them is a wifi-enabled power outlet that can be remotely switched on and off. The other is an RGB LED strip whose color can be changed remotely. I work with a Linux machine at home, and needed to update my udev rules in order for the USB interface of my NodeMCU modules to be recognized so that I could program them. Besides that, the whole process has been almost seamless.

holderThe components I used to interface a NodeMCU board with a 12V RGB LED strip

There is a simple “Hello World!” web server example that comes with the Arduino core package, and this example is less than 100 lines long. Making a wifi-enabled LED strip, on the software side, was a simple matter of parsing web request arguments for red, green and blue color values, and translating these into PWM intensity values to control output pins. Hardware-wise, I connected three 2N2222 transistors to the D0-D2 output pins, which are able to handle the current required to drive my one meter LED strip. The real beauty of it though, is that I can control the LED strip with shell commands, by issuing HTTP requests, which opens up the realm of scripting home automation:

# Make it red!
wget -O- 192.168.0.56/?r=255&g=0&b=0

2016-09-24-19-22-34

I intend to play with more of these in the future. There are now LED strips with individually-addressable LEDs, which seems like a lot of fun. I would also like to make a box with temperature, light and motion sensors that I can query remotely. If you’re interested in working with the ESP8266, I personally recommend buying a NodeMCU instead of a bare module. It will make your life much, much simpler.

Silicon Valley, Six Months Later

Maybe you’ve been wondering why I haven’t posted on this blog in over four months. The main reason is that my energy has been going elsewhere. In February, I defended my PhD thesis and relocated to the South SF Bay area (aka Silicon Valley) to work in tech. Most of my energy, since then, has been spent adapting to a new job and settling into a new environment. To be completely honest, it’s been difficult. I defended my thesis, prepared for an international move and started a new job all within one month. Hindsight is 20/20: if I had to do it over, I would take two or three months off after completing my degree to give myself a buffer to complete this transition.

Six months later, I have mixed feelings. I’m not sure how I feel about Silicon Valley. I picked an apartment in the South SF Bay because I wanted to be close to work. I’m able to bike to work, which is nice. However, as someone who grew up and lived all her life in a big city, the area feels very suburban and well, boring, to me. In all likelihood, part of the problem is that I don’t have many friends here. Once the work day is over, I often feel very lonely. It might sound cheesy, but this is probably what I regret most about leaving my hometown: leaving my friends and family behind.

I try to look at the bright side: I’m happy with my job. I like my boss and my coworkers. People here work reasonable hours and the work is challenging. I have my own place for the first time in my life. It’s a big apartment in a quiet neighborhood, it’s a comfortable place to live. Silicon Valley is a huge tech hub. It’s a good opportunity for me to learn a lot. Because I don’t go out much these days, I’ve actually made a decent amount of progress on my programming language project in my spare time. I also installed an electric motor on my bike and completed some electronic projects. These things give me some satisfaction. Still, loneliness is hard, and I don’t quite feel at home here yet.

 

Optimizing Ray Marching through Partial Evaluation

I love ray tracing as a 3D rendering technique. I love it because in its purest form, it’s very elegant and beautiful. It’s a simulation of light and photons flowing through an environment. Because of its physically-based nature, ray tracing can produce very realistic images from very simple code. You can render impressive scenes with just a few hundred lines of C. This is in stark contrast with state of the art polygon rasterization techniques used in modern 3D engines, which make use of a huge amount of very complex hacks to get something that tries to look somewhat realistic.

Traditional Whitted-style ray tracing is done with intersection tests, that is, the scene is rendered by testing if rays coming out of the camera and going into the scene intersect with various objects. With this style of rendering, in order to be able to render a 3D object, you need to be able to write a piece of code that checks whether or not a given ray intersects with the object, and if so, where the nearest intersection occurs (there might be multiple). This is relatively straightforward if you want to render something like a sphere, but it quickly gets more complicated when rendering more complex parametric surfaces.

Recently, I fell in love when I discovered a ray tracing technique that in many ways is even simpler yet more powerful than classical ray tracing. The technique, known as ray marching, or sphere tracing, makes it possible to render a scene that is directly defined by a Signed Distance Function (SDF). This is a function that, given a point in space, returns the closest distance to any point that is part of the scene. The distance is positive if the point is outside of scene objects, zero at the boundary, and negative inside scene objects.

The beauty of this is that SDFs are extremely simple to define. For instance, the signed distance from a point to a sphere is simply the distance between the point and the center of the sphere, minus the sphere’s radius. Better yet, SDFs can be combined together in very simple ways as well. You can produce the union of two objects by computing the minimum of both of their SDFs, and the intersection by computing the maximum. You can also do funky things such as infinitely repeating objects using modulo operators. SDFs truly have a lot of expressive power due to their combinatorial nature.

The demoscene group Mercury has produced some amazing demos of 3D animations based on the rendering of SDFs on the GPU. These demos are entirely procedural and fit in a tiny 64 kilobytes! The Timeless demo, in particular, showcases the kinds of spatial transformations and deformations of space that can be done when your scene is entirely defined by a mathematical function, as opposed to polygons. There are many interesting SDF code examples on ShaderToy you’re interested in playing with them.

Hopefully, by this point, I’ve convinced you that SDFs and ray marching are insanely cool. It’s also pretty amazing that modern GPUs are now fast and flexible enough that you can render fairly interesting SDFs in real-time. Unfortunately, SDFs remain expensive enough to render that it’s still tricky to render complex scenes. I think it would be great to live in a world where SDFs and real-time ray tracing could replace polygons, but it seems we aren’t quite there yet.

I spent some time thinking about SDFs, and it got me thinking: there’s got to be a way to make these faster. If you think about it, rendering an SDF is costly because there’s quite a bit of computations going on at every pixel, many evaluations of a potentially complex SDF. This computational work, however, is highly redundant. Do you really need to evaluate the entire SDF for every single pixel? Ray marching is very much a brute force technique. There just has to be a more efficient way to go about this.

SDFs are mathematical functions, pieces of code that get repeatedly evaluated. I come from a compiler background, and so I started wondering if, potentially, compiler optimizations could be applied to these SDFs. What if, for instance, we could apply partial evaluation to optimize the said distance functions and make them run faster? I did some research, and it turns out, unsurprisingly, that I wasn’t the first one to think of such a concept. There has already been work in applying partial evaluation to ray tracing and applying partial evaluation to the optimization of shaders.

The work that’s been done on applying partial evaluation to ray tracing, in my opinion, doesn’t go far enough. The authors have essentially partially evaluated the ray tracing algorithm and specialized it in function of the objects present in a given 3D scene. What I think would be very useful, is to specialize the evaluation of SDFs in function of the camera position. What if, for instance, you could mathematically prove that certain objects in your scene are not present in the left half of the view frustum. That is, what if you could prove that most of the objects in your scene are not visible on the left half of the screen?

It seems like it would be relatively straightforward to recursively subdivide the image and view frustum into multiple quadrants and determine which objects will definitely not be visible in each. I know it can be done, in fact, directly by using the definition of SDFs, without very fancy tricks. If you could do this, then you could generate a separate, optimized SDF for smaller fractions of the image to be rendered. What I’m essentially proposing is to generate optimized pieces of shader code on the fly for smaller areas of the screen, which can be evaluated much faster than the SDF for an entire scene.

I don’t know how viable it is to compile and run fresh shader code every time the camera moves on a GPU, but I believe it might actually be possible, using this kind of optimization, to render SDFs on a multicore desktop CPU at decent frame rates.

Sense from Chaos – Crossing the Semantic Gap

Edsger Dijkstra once said:

“The question of whether machines can think is about as relevant as the question of whether submarines can swim.”

This was the view most AI researchers held back in the 1960s and 1970s. Back then, many thought that general-purpose AI could be achieved purely through symbolic manipulation. That is, it was thought that we could build machines that, through purely logical reasoning, would derive a sufficient understanding of the world to reach and even exceed human intelligence. This kind of vision of AI is illustrated in classic sci-fi novels, and embodied in the infamous HAL 9000 computer of 2001: A Space Odyssey.

The quest to build symbolic AI led to much research, and impressive early successes. These successes lead to great optimism, and the belief that computers would be able to effectively handle human language, machine translation and vehicle driving before 1980. Symbolic reasoning is very adequate for reasoning about mathematics, or small closed systems with very well-defined properties, such as the game of chess. Unfortunately, multiple dead ends were quickly reached. It was found, as philosophers had predicted, that you could hardly build a machine capable of reasoning about the real world through purely symbolic means. The problem was one of grounding.

Symbols in a vacuum don’t mean anything. You can create an ontology in which you define the concept of a chair, and you can put logical statements in this ontology as to how people use chairs for sitting on, and that chairs are movable objects that behave according to Newtonian laws of physics. However, if you’re trying to build a robot that can see and manipulate chairs, your ontology is basically worthless. You have two HD cameras providing you with 20 million pixels of raw data at 60 frames a second, and no logical statements are sufficient to help you tell where a chair might be in all that noise. Your machine has the concept of a chair, but this concept exists only in some Platonic realm that is completely divorced from the real-world.

Ultimately, the grandiose predictions of early AI researchers proved much too ambitious, because the real world was much too difficult to cope with, or at least, much more difficult to cope with than mathematicians had hoped. This lead to the first AI winter, with funding rapidly drying up for what is now known as GOFAI (Good Old Fashioned AI) research. For at least two decades after this, AI became kind of a dirty word in computer science circles. Those who still pursued AI-related research renamed their field machine learning, so as to avoid any association with the overhyped goal that we might one day build machines with human-level intelligence. From then on, machine learning researchers left behind the lofty goals of AI, and instead focused on the basics: small narrowly-defined learning tasks where they knew they could make headway.

Neural networks are not new, but it’s only in the last few years that they have started to really shine, with deep learning. Advances in algorithms, access to large datasets and the unprecedented availability of computational resources have made it possible to scale this machine learning technique to networks with many layers of depth. Recently, some impressive feats have been achieved with deep neural networks, including object classification that exceeds human performance, and of course the much-discussed victory of the computer program AlphaGo over human Go champion Lee Sedol. Some amazing projects such as the neural artistic style transfer and and deep networks that learn to synthesize new images have also surfaced.

At the moment, universities are seeing a huge increase in student interest in deep neural networks, with classes sometimes tripling in size over previous years. Governments and industry alike are pouring billions into deep learning research. It’s undeniable: we’re in the middle of a machine learning boom. The optimism is so great that well-known researchers such as Yann Lecun and Yoshua Bengio are even daring to use the word AI again.There’s a lot of excitement (and some fear) about what deep neural networks will bring. People are becoming scared about robots taking human jobs, and the question of when computers will reach human-level intelligence is being asked.

To me, it really is a question of when. Provided that humanity doesn’t sink into a post-apocalyptic middle-age, there is no doubt in my mind that machines can and will reach human intelligence. I just don’t think that we’ll get to Artificial General Intelligence (AGI) in the way that most people think we will. Many seem to believe that we just haven’t come up with the right algorithm for human-equivalent intelligence, or that we just don’t have enough computational power. Clearly, to reach human-level intelligence, what we need is a deep neural network with a number of connections equivalent to that present in the human brain, right? I don’t think so. I don’t think that’s what we’re missing.

In my view, the human brain is not running some one algorithm. The human brain is not just a huge wad of neurons, a huge uniform neural network. The human brain is made of many different specialized components that do different things, connected together by a variety of pathways. Deep neural networks are awesome, they’re an amazing achievement, but they’re just one piece of the puzzle. What deep neural networks do, is solve the perceptual problem. Deep learning allows us to do what the human visual cortex does. You get a million pixels of color information as input, and you turn this raw mass of data into a few classes of output. In short, with deep neural networks, we can turn real-world data into symbols.

There is no doubt in my mind that an AGI has to contain some sort of connectionist component, such as a neural network within it. To make useful AI programs however, the logical thing to do seems to be to assemble multiple specialized pieces together. In my view, AlphaGo is a beautiful illustration of this. It melds together multiple deep neural networks, which are used to do things such as assigning a value to different Go board configurations, along with a Markov tree search algorithm for looking at possible future moves. AlphaGo is very much a marriage of GOFAI techniques together with the power of deep neural networks. Deep networks make sense of complex board configurations without the need for hard-written logical rules as to how individual Go stones should be counted. Deep networks do the more intuitive perceptual work, and good old fashioned tree search does the logical reasoning based on this data.

Deep neural networks bridge the semantic gap between classical computer systems, where symbolic entities are defined in absolute terms in databases, and the fuzziness of the real world, where exceptions are the norm, and nothing can be entirely captured by absolute rules. If I had to guess, I would speculate that robots of the future, as they get more intelligence, are going to feature increasingly complex AI architectures made of multiple specialized components. There will be deep networks that do vision, hearing, perceive human faces and facial expressions. There will also be modules that do planning, navigation, logical/ontological reasoning and reasoning by analogy. All of this AI technology that we’ve been developing for the last 60 years is going to need to come together into a whole. That is one of the next big challenges. Making AGI happen won’t just be scientific work, it will also pose a plethora of engineering challenges.

I think it’s possible to build useful robots and human-equivalent AI without understanding the human brain in depth. However, I think that when we do build such AI, we’re necessarily going to converge towards an architecture that does many things in ways that are actually quite similar to what happens, in computational terms, inside the human brain. Submarines may not swim, but in order to move forward, they still have to displace water. It’s said that humans can only keep about 7 items of data in their working memory. It wouldn’t surprise me all that much if, one day, neuroscientists manage to map out the human frontal cortex, and they discover that in this region of the brain, there are neurons laid out in a system that implements what can essentially be thought of as a small set of general-purpose registers, each able to store a neural vector representing a semantic concept. In other words, our ability to reason by analogy and to manipulate abstract concepts in our mind is probably embodied by neural machinery that serves to perform symbolic manipulations.

Moving to the Valley

I haven’t talked much about my research work on this blog recently. That’s because I’ve been busy writing my thesis and wrapping things up. The good news is that I’ve had my thesis defense two weeks ago, and I’ve completed the requirements to get my PhD. It feels a little surreal to have finally reached this goal, after six and a half years. The next step, for me, is a transition to working in the industry. I considered doing a postdoc after my PhD, but it’s become fairly clear to me that although I do enjoy research, I can’t stand the way academia operates, and the focus on minimum publishable units.

At the moment, I’m in the process of selling, giving away and packing up my things in Montreal. I signed a job offer with Apple. I chose to work with them in large part because their recruitment process was very smooth and human, for lack of a better word. The people I interviewed with seemed relaxed, and like they genuinely enjoyed their work. I know from experience that this is not the case everywhere. I’ve had interviews at tech companies which run giant cubicle farms with people running everywhere, interviewers who look nervous and overcaffeinated. Apple is quite secretive, and I don’t think I’m allowed to say much about my job description, but I will be working on compiler technology, and developing my skills in this area further, which is what I wanted.

I will be moving to the bay area in early March. It pains me to leave behind Montreal, the city where I was born and raised, as well as all of my friends here, but It’s also exciting to be moving to Silicon Valley, the Holy Mecca of the technological world, and to get a chance to work on cutting edge technology. It will be nice to be able to meet more people who share similar interests on a regular basis. As someone who gets a bit emotionally down during the Canadian winter, it’s also uplifting to contemplate living in California, where there is plenty of sunlight and fresh produce all year long.

One of the decisions I have to make is where I’m going to live. Apple is based in Cupertino, which is about 45 miles from San Francisco. They have a shuttle which goes to the city, but that means two or more hours of commuting every weekday. I’ve lived in the city all my life, and I know that San Francisco is way more hopping than the bay area suburbs, but I find myself thinking that I should probably make my home closer to the Apple offices. If I live in the suburbs, I could probably bike to work every morning, a net positive for my health (whereas two hours of sitting in a bus would be a net negative). The suburbs also have slightly less insane rents than San Francisco, which means I could afford to have my own sweet 700 square feet of living space.

For those wondering, I do plan to keep on running this blog. I don’t know how much free time I can realistically expect to have, and how much energy I’ll be able to dedicate to coding after work hours, but I will try to keep advancing on my personal projects, and keep using this blog to discuss various topics I find interesting.