Skip to content

The AI Risk Isn’t What You Think

Recently, a number of prominent figures, including Elon Musk, have been warning us about the potential dangers that could arise if we can’t keep artificial intelligence under control. The fear surrounding AI dates back a long time. Novels and stories about robot takeovers date back as far as the 1920s, before the advent of computers. A surge of progress in the field of machine learning, and the subsequent investment of hundreds of billions of dollars into AI research by giants such as Google, Amazon, Facebook and Microsoft, has brought this fear back to the forefront. People are waking up to the fact that the age of AI and robots is coming soon, and that self-aware AI could very likely become a reality within their lifetime.

The 2014 book Superintelligence: Paths, Dangers, Strategies by Nick Bolstrom embodies this fear. In his book, Bolstrom details multiple scenarios in which AI could spiral out of control. I believe that the author has achieved his goal: he has successfully scared many researchers into paying attention to the existential threat surrounding AI, to the point where AI safety is now a serious field of research in machine learning. This is a good thing. However, I think that Bolstrom’s book is in many ways alarmist, and detracts from some of the bigger, more immediate threats surrounding AI.

Much of the Doomsday scenarios in the Superintelligence book are centered on the idea that AI entities will be able to rapidly improve themselves, and reach “escape velocity” so to speak. That they will go from human-level intelligence to something much beyond in a ridiculously short amount of time. In many ways, I believe this portrays a poor understanding of the field of machine learning, and the way technology usually progresses. I see at least three factors that make this scenario unlikely:

  1. While the idea of an AI entity rewriting its own machine code may be seductive to sci-fi authors, the way deep neural networks operate now, they would be hard pressed to do such a thing, particularly if they weren’t designed with that purpose in mind.
  2. Currently, machine learning researchers are struggling to put together enough computational power to train neural networks to do relatively simple things. If an AI became self-aware tomorrow, it probably couldn’t double its computational power over night, because doing so would require access to physical computing resources that simply aren’t there.
  3. Sudden explosive progress is not the way any past technology has progressed. As rapidly as computers have evolved, it took decades and decades to get from the ENIAC to the computers we have now. There is no reason to think that AI will be incredibly different. So far, the field of machine learning has seen a fairly gradual increase in the capabilities of the algorithms we have. It took decades to get to where we are now.

Silicon Valley likes to tell us that technological progress goes at an exponential rate, but fails to deliver any real evidence backing this dogmatic belief. In the case of self-aware AI, I think a more likely scenario is that we will be building machines with increasing levels of awareness of the world. We’ll build robots to clean up around our homes, and the first ones will be fairly stupid, limited to a small set of tasks. With never generations, they’ll become capable of doing more and more, and understanding more and more complex instructions. Until, eventually, you’ll be talking to a robot, and it will understand you as well as another human being would.

In my opinion, the advent of self-aware AI will require several more breakthroughs in machine learning. It may also require several generations of hardware that is designed with the sole purpose of accelerating neural networks. The good thing is that if self-aware AI takes a long time to emerge, the first general-purpose AIs will have a fairly limited understanding of the world, and limited computational capabilities. This means those first AIs will simply not be capable of taking over the world. It also means we may have several years to test a number of fail-safe mechanisms between the time where AIs start to have a useful understanding of the world, and the point where they are genuinely dangerous.

I think that, in some ways, the focus on the existential threat surrounding AI detracts us from a bigger, more immediate danger. AI is an immensely powerful tool. In the hands of giant corporations like Google and Facebook, it can be used to sift through every text message and every picture you post online. It can be used to analyze your behavior, control the information you see. The biggest risk posed by AI, in my opinion, is that it’s a tool that can be used to manipulate your life in ways that are useful to those who control the AI. It’s an incredibly powerful tool which is controlled by a very small few.

 

Advertisements

Zeta’s JITterpreter

About six weeks ago, I made ZetaVM open source and announced it on this blog. This is a compiler/VM project that I had been quietly working on for about 18 months. The project now has 273 stars on GitHub. This is both exciting and scary, because so much of what I want to do with this project is not yet built. I really want to show this project to people, but I also find myself scared that people may come, see how immature/incomplete the project is at this stage, and never come back. In any case the project is open source now, and I think it would be good for me to write about it and explain some of the design ideas behind it, if only to document it for current and potential future collaborators.

One of the main goals of ZetaVM is to be very approachable, and enable more people to create their own programming languages, or to experiment with language design. With that goal in mind, I’ve designed a textual, stack-based bytecode format that resembles JSON, but allows for cycles (objects can reference one-another). Functions, basic blocks, and instructions in this bytecode format are all described as a graph of JS-like objects. This is very user-friendly. Anyone can write a Python script that outputs code in this format and run the output in ZetaVM. It’s also very powerful, in a LISP-y code-is-data kind of way: you can generate and introspect bytecode at run time, it’s just made of plain objects and values that the VM can manipulate. ZetaVM has first-class bytecode.

The downside of all this, as you can imagine, is that it inherently has to be absolutely dog slow. Or does it? The first version of the Zeta interpreter traversed the graph of bytecode objects the naive way, and it was indeed dog slow. I’ve since written a new interpreter which removes the overhead of the object-based bytecode by dynamically translating it to an internal representation (think dynamic binary translation). The internal IR is compact and flat, executing it involves no pointer hopping.

The new interpreter generates code on the fly, which means it is, by definition, a Just-In-Time (JIT) compiler. It’s architecture is based on Basic Block Versioning (BBV), a compilation technique I developed during my PhD (talks and papers on the topic are linked here if you’re curious). BBV has the nice property that it generates code lazily, and the generated code naturally ends up compact and fairly linear. This is not done yet, but BBV also makes it possible to specialize code to eliminate dynamic type checks very effectively, and to perform various optimizations on the fly.

You might be wondering why I’m bothering with an interpreter, instead of just writing a JIT that generates machine code. One of the motivating factors is that Zeta is still at a very early stage, and I think that an interpreter is a better choice for prototyping things. Another factor is that it occurred to me that I could potentially make Zeta more portable by having the interpreter do much of the compilation and optimization work. The interpreter can do type-specialization, inlining and various code simplifications.

The interpreter will be designed in such a way that the internal IR it produces is optimized and in many ways very close to machine code. It should then be possible to add a thin JIT layer on top to generate actual machine code. The resulting JIT will hopefully be much simpler and easier to maintain than if one were compiling directly from the raw object-based bytecode format. Another benefit of this design is that all of the optimizations that the interpreter perform will not be tied in with the specifics of x86 or other architectures, they will remain portable.

At the moment, the new interpreter is at a point where it lazily compiles code into the flat internal format, but performs no other optimization. This was enough to get a 7x performance improvement over the naive interpreter, but the current system is still quite a bit below the performance level of the Python interpreter, and there is definite room for improvement. Some of the first optimizations I would like to introduce are the elimination of redundant branching instructions, and the use of inline caching to speed up function calls.

The elimination of redundant branches is fairly easy to do with BBV. Code is lazily compiled, and appended linearly into a buffer. When generating code for a branch, if the block we are jumping to is just about to be compiled, then the branch is redundant. BBV will naturally tend to generate code that flows linearly along hot code paths and branches out-of-line for infrequent code paths. That is, the default path often comes right next and requires no branching.

Inline caching is a classic technique that was pioneered by the Smalltalk and SELF VMs. It’s used, among other things, to eliminate dynamic property lookups when polymorphic function calls are performed (see this excellent blog post for more information). Currently, ZetaVM, when performing a function call, needs to read multiple properties on function objects. For instance, it needs to find out how many arguments the function has, and what is its entry basic block. These property lookups are dynamic, and relatively slow. The end result is that the call instruction is very slow compared to other instructions.

Most call instructions will end up always calling the same instruction. Hence, dynamic overhead on function calls can largely be eliminated by caching the identity of the function being called by a given call instruction. That is, one can cache the number of arguments and entry basic block associated with a function object the first time a call instruction is run, and then reuse this information when calling the function again, provided we keep calling the same function. This information will be cached in line, in the instruction stream, right after the call instruction opcode, hence the name inline caching. I anticipate that with inline caching, function calls in Zeta can be made several times faster.

ZetaVM, my new compiler project

Like many of you I’m sure, I’ve wanted to create my own programming language for a long time. I think it’s a common aspiration for many programmers, to create a language with all the features we love and none of the ones we hate. Our own “ultimate programming language”, created in our own image. What I’ve come to realize over time, however, is that I’m actually not quite sure what should go into my ultimate programming language. I have a lot of ideas, but they don’t necessarily fit into a coherent whole. It’s also the case that my ideas about what the “ultimate programming language” should be like keep changing as I gain more programming experience and get exposed to new ideas.

My PhD was in compiler design, and this is something I truly enjoy playing with. As such, I’ve decided that my next programming language project wouldn’t be a programming language per-se, but actually a compiler, a platform to create new programming languages. I’m doing this in part because I enjoy it, and it’s something I feel confident I’m good at, but also because I think I can build a platform that will make it much easier for myself and others to do programming language design and experimentation. ZetaVM is going to have, as one of its main design goals, to make creating new programming languages very accessible. It will make it possible for anyone who’s mastered a language such as Python to create a language of their own, in less than 2000 lines of code. Not only that, but ZetaVM will instantly make your language decently fast. It will have JIT optimizations suitable to languages such as Python/Lua/JavaScript, and will instantly give you fast objects, type-specialization, etc.

ZetaVM is a virtual machine for dynamic programming languages. It will provide native support dynamic typing and most common data types found in Python/Lua/JS/Ruby, such as strings, extensible objects, extensible arrays. What makes it particularly easy to get your own language running on this VM is that Zeta’s Intermediate Representation (IR) is representable as a textual format similar to JSON. This makes it fairly trivial for you to write, say, a Python parser for your new language, and generate Zeta IR in a textual format at the output. You don’t have to worry about implementing dynamic typing, or register allocation, or garbage collection, or arrays and objects, all of that is done for you. I’ve created a simple language called Plush (JS and Lua’s bastard child), which demonstrates how this can be done, and serves to help me bootstrap and test the system.

Beyond making it easy for myself and others to create programming languages, Zeta will be a platform for me to try some bold new ideas in the design space of programming languages and JIT compilers. I would like to try and tackle one of the biggest issues plaguing programming languages today, which is that of code rot. My goal is to eventually freeze the IR and APIs provided by Zeta, so that code that runs on Zeta today might have a chance of still working in 20 years, without any changes. This goal is ambitious, but I have some ideas which I believe might make it work.

Finally, one big disclaimer I should give is that Zeta is still a young and immature project. In its current state, Zeta is experimental and will have many breaking changes, as most new languages/platforms do. Zeta also currently only has a naive interpreter which walks the object-based IR and is dog slow, about 200K instructions per second. I’m currently working on an interpreter that will compile the object-based IR into a lower-level internal IR. This interpreter will use Basic Block Versioning (BBV) and self-modifying code. I believe it should realistically able to reach speeds of 100MIPS within the coming months. My plan after that is to build a lightweight JIT which will sit on top of the optimizing interpreter and compile the internal IR to machine code.

Technodiversity

The year is 2048. Migration to IP-V6 has just been completed. Every object in the world, including your coffee mug and the chair you’re sitting on, comprises a manycore RISC-V cluster, running Linux, with its own unique IP address. Haskell, because of its mathematically provable superiority, has come to supplant every other programming language. Writing code in any language other than Haskell is now a misdemeanor, punishable by up to 64 days of bandwidth throttling and a fine of up to 0.125BTC.

Wouldn’t things be much simpler if every computer system was built on the platform, ran the same operating system, and every program was written in the same programming language? I think that in many ways, this is the wet dream of many programmers. No more dealing with cross-language boundaries, portability issues, and multiple incompatible implementations of subpar standards. Things could be much simpler and more efficient than they are now.

The biggest problem, however, is that in a world where every computer system runs the same software, the same version of the same operating system, on the same hardware, every computer system has exactly the same bugs and security weaknesses. Given that some security flaw exists, a single computer virus could potentially contaminate every computer system in the world in a very short amount of time. In today’s world, this Hollywood-like doomsday scenario seems extremely implausible. The real world is too messy, or too diverse, for it to be practical to build a single virus or worm that could infect every system.

In a lot of ways, the chaos of the technological world resembles that of nature. Like animal species competing for survival, various technologies and standards compete for funding and mindshare. In nature, species specialize to exploit new environments. Diversity, in the technological world, exists in part because specialization makes systems more efficient, which allows the exploitation of new market niches. In nature, genetic diversity, or biodiversity, makes it near-impossible for a single virus to wipe out the entirety of life on earth.

Today’s technological world is definitely messy, but we can take comfort in the fact that competition really does foster innovation. The technological diversity that exists right now is a sign that we live in a thriving ecosystem, as opposed to one that is stagnant. With self-driving cars and a variety of home automation devices on the horizon, we can also take comfort in the idea that technodiversity may actually be keeping us safe.

 

The Brain’s Registers

In the last few years, there’s been a lot of impressive work done with neural networks. One of the most interesting things I’ve seen is Word2vec, a technique to compute word embeddings, to map english words to large vectors of neural activations. This is interesting because this latent space for words has interesting properties. Concepts that are semantically close end up close together in the vector space, and it’s possible to do arithmetic on word vectors, giving us results such as ‘queen – woman + man ≈ king’.

The ability to do approximate arithmetic on word vectors is nifty, but what I think is most interesting is that Word2vec shows that we can translate fuzzy human concepts such as english words or sentences into an encoding from which computers can extract semantic information. There are now machine translation models based on recurrent neural networks which take a sentence in a language X, generate an activation vector, and then feed that vector into another recurrent neural network trained to generate text in another language Y. These machine translation models have reached the point where they are competitive with traditional statistical models.

The ability to create an intermediate representation for machine translation is quite powerful, but vectors of neural activations can be used to encode semantic information contained in many things besides written language, and also to transfer information from one modality to another. Researchers at Google have also shown that you can generate captions from images by first encoding images into semantic vectors, and then feeding those vectors into a recurrent neural network which generates text. Hence, a vector of neural activations can be used to represent semantic information, meaning, maybe even concepts and ideas in an abstract space that is agnostic to input and output modalities.

I hope this doesn’t seem too far-fetched, but I’m going to suggest that it’s highly likely that such semantic vectors are used to convey meaning and transfer information between different representations inside the human brain, particularly in the areas of the prefrontal cortex (PFC) that implement higher-level thinking. I’m going to go even further, and make the prediction that the PFC implements something analogous to the registers found in a microprocessor. Semantic registers which are used to buffer, store and transfer vectors of neural activations that encode meaning, concepts or ideas.

There is obviously still a lot we don’t know about the human brain, or the way in which it implements higher-level thinking, and the neurons in artificial neural networks are very loose approximations of their biological cousins. Still, bees, bats and birds use wings to fly. These are different kinds of wings, with different properties, which have evolved independently, but they are all wings nonetheless. All three have come to evolve a similar solution to the problem of flight because this solution is natural and efficient. In the same vein, I think you could make the argument that a register is a natural concept when it comes to shuffling and operating on data.

The registers in a microprocessor hold small pieces of temporary data. They store vectors of bits and typically have a few input and output ports. They implement simple operations, such as the ability to reset/erase the contents stored, and the ability to open or close the flow of information to/from various components inside the processor. In digital hardware, opening and closing of data ports is accomplished by gating signals with transistors. In biology, the gating of signals is accomplished by inhibitory neurons.

According to Wikipedia, working memory is “a cognitive system with a limited capacity that is responsible for the transient holding, processing, and manipulation of information”. It’s believed that human beings are limited to being able to hold only about seven items at a time in working memory. Could it be that this temporary store of information is the human brain’s equivalent of a CPU’s register file, built out of a relatively small set of neural registers, with inhibitory neurons gating their outputs? I believe this could begin to explain our amazing (and yet limited) ability to shuffle and operate on concepts and ideas stored in our working memory.

In a typical CPU, the register file is connected to components such as an Arithmetic and Logic Unit (ALU). The ALU can read data from one or more registers, and perform operations such as addition, subtraction and multiplication between the values read, and store the result back into registers. Values can also be compared and the result of such comparisons used to decide which action the CPU should take next. Human beings are not born with the ability to compute signed integer multiplications, but there are certain high-level thinking faculties which may be innate, such as the ability to reason by analogy.

One could imagine that, somewhere in the PFC, there may be a dedicated component which is able to load semantic registers and draw analogies between semantic vectors, to answer the question of whether A is to B as C is to D, and perform some action based on the result. It could be that the part of our brain that does high-level thinking contains a number of operators that connect to one or more semantic registers and perform various operations on the concepts represented. These would be our brain’s ALU: the set of primitives that enable our high-level cognitive abilities.

As Edsger Dijkstra famously said, “The question of whether machines can think is about as relevant as the question of whether submarines can swim.” Maybe my attempt to make sense of higher-level human thinking by drawing analogies with computing hardware is misguided. Maybe working as a compiler engineer and looking at machine code for too long has warped my own cognition beyond repair. The ideas I’m proposing here are largely speculative, but maybe I’m right, and maybe, within a decade or two, someone will begin to map out the circuits that constitute the brain’s registers and operators.

 

 

 

 

Method Call Syntax in a JS-like Language

As a side-project, I’m working on a small JavaScript-like programming language called Plush. Like JavaScript, this language is going to have objects and method calls. In this blog post, I’m going to discuss some seemingly trivial aspects of method call syntax and semantics which illustrate why programming language design is difficult, and solicit your feedback as to potential solutions.

Like JavaScript, plush has object literals which can be defined with the curly brace syntax. It will also have function objects, which are in fact closures. In JavaScript, you can assign a function to an object property, and it becomes callable as a method, like so:

o = {}; // empty object
o.m = function (x) {…};
o.m(2); // call the function in property m as a method

This all seems fine and dandy, but there are some issues with the way JavaScript implements this. The first is that method call syntax introduces some “hidden” semantics. When you call a JS function as a method, there is a hidden this argument (a reference to the object the method is called on) which gets passed to functions. This is invisible to the programmer. Hence:

// The statement below:
o.m(2);

// Is not equivalent to these statements:
f = o.m;
f(2);

// But it is equivalent to these two statements:
f = o.m;
f.call(o, 2);

The passing of the hidden this argument is a little bit annoying because it breaks the programmer’s intuition in a small way. JS syntax makes method calls look like regular function calls, but there is a difference. Sure, it’s not a big deal, but I ran into this issue while implementing IO functions in Plush. I wanted to implement some C++ “host” functions which would allow the language to interface with the outside, as such:

io = import “core.io”; // import the IO module

io.print(“hello world!”); // here ‘print’ is a host function written in C++

Maybe you see where this is going. If I implement method calls as JavaScript does, then every function, including host functions, need to take a hidden this argument, even if they don’t need it. I can work around this by making a special case where host functions called as methods behave differently, but then I potentially have to add an extra dynamic check at every function call. This gets annoying. I can work around this issue by wrapping my C++ host functions into plush functions which handle the unneeded this argument.

The second issue I ran into involves the way the this argument is handled. JavaScript has this problem where if you declare a closure inside a method, you can’t directly access the parent function’s this argument, as described in this blog post. This annoying quirk of the language was “solved” by the introduction of Function.prototype.bind. The JS workaround seemed like an ugly kludge to me, and so I thought “why not just do it like Python”? Why not make the this argument explicit. Make programmers declare and name the this value, in the same way that Python forces you to declare the self argument in methods.

Upon first examination, making the programmer explicitly declare the this argument seems like a fairy good idea. However, it does have the annoying consequence that an argument that was previously hidden is now a positional argument. Consider the following scenario:

// An IO module is imported, this produces an object containing IO functions
io = import “core.io”;

// The print method is “extracted”, so that we can call it with a shorthand name
print = io.print;

// This works fine. It passes a hidden this argument value to the print method
io.print(2);

// ERROR: we are not passing a this argument value
print(2);

In the case of modules, it’s clear that the print function shouldn’t even need a this argument value. I pondered this problem, and had the idea that possibly, method calls could have a different syntax from property accesses. The “arrow syntax” would make explicit the passing of the this argument:

// Call of the method m of object o
// The “arrow syntax” is specifically for method calls, and passes a this argument
o->m(2);

// Equivalent function call, passing an explicit this argument
o.f(o, 2);

io = import “core.io”;

// Regular function call, print does not take a this argument, it is not a method
io.print(2);

// This works correctly, as one might expect
print = io.print;
print(2);

The solution I came up with is not perfect, it raises some potential problems. For one, with a special arrow syntax for method calls, it becomes possible to call object properties using both regular function calls, and arrow style method calls which pass a this argument. Experience tells me that if both styles are possible, people are going to use both, which could be messy. For example, what do you do if you have a set of methods which do not need a this argument? Do you declare one anyway? Would you end up with a mix of regular function calls and method calls on the same object?

So, I’ve been using gedit this whole time

This is my coming out. I’m 31. I’ve been programming since I was 16, but all this time, I’ve never learned how to use a “proper” text editor.

It’s probably because as a teenager, I grew up with Windows 98, and on that platform, back then, command-line tools were very much second-class citizens. In university, I was introduced to Linux and somehow, over the years, I became hooked. I spent a total of 11 years in university, and over that course, I probably wrote over 400K lines of code. Most of it on Linux, and most of it using a text editor called gedit.

It’s the GNOME text editor. Its noteworthy features are syntax highlighting, and the ability to have multiple tabs open at once. It’s a fairly shitty text editor. It will freeze up if you try to open a text file more than a few hundred kilobytes, or with lines that are too long for it to properly digest. If you were to think of emacs as a sushi meal prepared by an experienced chef, then you could think of gedit as a microwaved bowl of ramen with a half cup of sriracha sauce dumped on it. I had the ramen right here at home, no cash on hand, and well, I was hungry.

I think the problem is, in university, there was never any class that was really about teaching us how to use properly and efficiently use tools. I learned about programming languages, operating systems, compiler design, and all those wonderful things, but all this time, I was never given the opportunity to sit down and think about the tools I was using. The craftsmanship of programming, so to speak. I started using gedit because it was the default on the Linux distro my university ran at the time. I stuck with it because of habit.

During my undergrad, we were given assignments with tight deadlines. We worked our asses off. Some semesters, I remember having as much as 40 hours of homework per week. Needless to say, there was a lot of pressure. Pressure to get things done quickly Given tight assignment deadlines, I didn’t really feel like spending 10 or 20 hours familiarizing myself with vi or emacs, or dealing with people who would tell me to go RTFM (Read the Fucking Manual). I went with gedit because it was intuitive and comfortable, as shitty as it was.

At my current workplace, we run MacOS, and well, while there is a port of gedit (not even kidding), it’s a fairly miserable experience. The MacOS port of gedit is like one of those freak creatures in a horror movie that begs you to pkill -9 it. Not knowing where to run, I started using the GitHub atom editor. It’s alright, but like gedit, it has its annoying quirks.

I’m 31, and I can’t help but want to take a step back. I don’t know how to use vim, but I think I’d like to learn. I can appreciate that a lot of thought was put into its design. It’s not a gimmicky gadget, it’s a powerful tool: a programmer’s text editor. I’m trying to convince myself that the time investment is worth it. At the very least, I can appreciate that vim is much more cross-platform, and stable across time, than any other text editor I’ve ever put up with.