Skip to content

Technodiversity

The year is 2048. Migration to IP-V6 has just been completed. Every object in the world, including your coffee mug and the chair you’re sitting on, comprises a manycore RISC-V cluster, running Linux, with its own unique IP address. Haskell, because of its mathematically provable superiority, has come to supplant every other programming language. Writing code in any language other than Haskell is now a misdemeanor, punishable by up to 64 days of bandwidth throttling and a fine of up to 0.125BTC.

Wouldn’t things be much simpler if every computer system was built on the platform, ran the same operating system, and every program was written in the same programming language? I think that in many ways, this is the wet dream of many programmers. No more dealing with cross-language boundaries, portability issues, and multiple incompatible implementations of subpar standards. Things could be much simpler and more efficient than they are now.

The biggest problem, however, is that in a world where every computer system runs the same software, the same version of the same operating system, on the same hardware, every computer system has exactly the same bugs and security weaknesses. Given that some security flaw exists, a single computer virus could potentially contaminate every computer system in the world in a very short amount of time. In today’s world, this Hollywood-like doomsday scenario seems extremely implausible. The real world is too messy, or too diverse, for it to be practical to build a single virus or worm that could infect every system.

In a lot of ways, the chaos of the technological world resembles that of nature. Like animal species competing for survival, various technologies and standards compete for funding and mindshare. In nature, species specialize to exploit new environments. Diversity, in the technological world, exists in part because specialization makes systems more efficient, which allows the exploitation of new market niches. In nature, genetic diversity, or biodiversity, makes it near-impossible for a single virus to wipe out the entirety of life on earth.

Today’s technological world is definitely messy, but we can take comfort in the fact that competition really does foster innovation. The technological diversity that exists right now is a sign that we live in a thriving ecosystem, as opposed to one that is stagnant. With self-driving cars and a variety of home automation devices on the horizon, we can also take comfort in the idea that technodiversity may actually be keeping us safe.

 

The Brain’s Registers

In the last few years, there’s been a lot of impressive work done with neural networks. One of the most interesting things I’ve seen is Word2vec, a technique to compute word embeddings, to map english words to large vectors of neural activations. This is interesting because this latent space for words has interesting properties. Concepts that are semantically close end up close together in the vector space, and it’s possible to do arithmetic on word vectors, giving us results such as ‘queen – woman + man ≈ king’.

The ability to do approximate arithmetic on word vectors is nifty, but what I think is most interesting is that Word2vec shows that we can translate fuzzy human concepts such as english words or sentences into an encoding from which computers can extract semantic information. There are now machine translation models based on recurrent neural networks which take a sentence in a language X, generate an activation vector, and then feed that vector into another recurrent neural network trained to generate text in another language Y. These machine translation models have reached the point where they are competitive with traditional statistical models.

The ability to create an intermediate representation for machine translation is quite powerful, but vectors of neural activations can be used to encode semantic information contained in many things besides written language, and also to transfer information from one modality to another. Researchers at Google have also shown that you can generate captions from images by first encoding images into semantic vectors, and then feeding those vectors into a recurrent neural network which generates text. Hence, a vector of neural activations can be used to represent semantic information, meaning, maybe even concepts and ideas in an abstract space that is agnostic to input and output modalities.

I hope this doesn’t seem too far-fetched, but I’m going to suggest that it’s highly likely that such semantic vectors are used to convey meaning and transfer information between different representations inside the human brain, particularly in the areas of the prefrontal cortex (PFC) that implement higher-level thinking. I’m going to go even further, and make the prediction that the PFC implements something analogous to the registers found in a microprocessor. Semantic registers which are used to buffer, store and transfer vectors of neural activations that encode meaning, concepts or ideas.

There is obviously still a lot we don’t know about the human brain, or the way in which it implements higher-level thinking, and the neurons in artificial neural networks are very loose approximations of their biological cousins. Still, bees, bats and birds use wings to fly. These are different kinds of wings, with different properties, which have evolved independently, but they are all wings nonetheless. All three have come to evolve a similar solution to the problem of flight because this solution is natural and efficient. In the same vein, I think you could make the argument that a register is a natural concept when it comes to shuffling and operating on data.

The registers in a microprocessor hold small pieces of temporary data. They store vectors of bits and typically have a few input and output ports. They implement simple operations, such as the ability to reset/erase the contents stored, and the ability to open or close the flow of information to/from various components inside the processor. In digital hardware, opening and closing of data ports is accomplished by gating signals with transistors. In biology, the gating of signals is accomplished by inhibitory neurons.

According to Wikipedia, working memory is “a cognitive system with a limited capacity that is responsible for the transient holding, processing, and manipulation of information”. It’s believed that human beings are limited to being able to hold only about seven items at a time in working memory. Could it be that this temporary store of information is the human brain’s equivalent of a CPU’s register file, built out of a relatively small set of neural registers, with inhibitory neurons gating their outputs? I believe this could begin to explain our amazing (and yet limited) ability to shuffle and operate on concepts and ideas stored in our working memory.

In a typical CPU, the register file is connected to components such as an Arithmetic and Logic Unit (ALU). The ALU can read data from one or more registers, and perform operations such as addition, subtraction and multiplication between the values read, and store the result back into registers. Values can also be compared and the result of such comparisons used to decide which action the CPU should take next. Human beings are not born with the ability to compute signed integer multiplications, but there are certain high-level thinking faculties which may be innate, such as the ability to reason by analogy.

One could imagine that, somewhere in the PFC, there may be a dedicated component which is able to load semantic registers and draw analogies between semantic vectors, to answer the question of whether A is to B as C is to D, and perform some action based on the result. It could be that the part of our brain that does high-level thinking contains a number of operators that connect to one or more semantic registers and perform various operations on the concepts represented. These would be our brain’s ALU: the set of primitives that enable our high-level cognitive abilities.

As Edsger Dijkstra famously said, “The question of whether machines can think is about as relevant as the question of whether submarines can swim.” Maybe my attempt to make sense of higher-level human thinking by drawing analogies with computing hardware is misguided. Maybe working as a compiler engineer and looking at machine code for too long has warped my own cognition beyond repair. The ideas I’m proposing here are largely speculative, but maybe I’m right, and maybe, within a decade or two, someone will begin to map out the circuits that constitute the brain’s registers and operators.

 

 

 

 

Method Call Syntax in a JS-like Language

As a side-project, I’m working on a small JavaScript-like programming language called Plush. Like JavaScript, this language is going to have objects and method calls. In this blog post, I’m going to discuss some seemingly trivial aspects of method call syntax and semantics which illustrate why programming language design is difficult, and solicit your feedback as to potential solutions.

Like JavaScript, plush has object literals which can be defined with the curly brace syntax. It will also have function objects, which are in fact closures. In JavaScript, you can assign a function to an object property, and it becomes callable as a method, like so:

o = {}; // empty object
o.m = function (x) {…};
o.m(2); // call the function in property m as a method

This all seems fine and dandy, but there are some issues with the way JavaScript implements this. The first is that method call syntax introduces some “hidden” semantics. When you call a JS function as a method, there is a hidden this argument (a reference to the object the method is called on) which gets passed to functions. This is invisible to the programmer. Hence:

// The statement below:
o.m(2);

// Is not equivalent to these statements:
f = o.m;
f(2);

// But it is equivalent to these two statements:
f = o.m;
f.call(o, 2);

The passing of the hidden this argument is a little bit annoying because it breaks the programmer’s intuition in a small way. JS syntax makes method calls look like regular function calls, but there is a difference. Sure, it’s not a big deal, but I ran into this issue while implementing IO functions in Plush. I wanted to implement some C++ “host” functions which would allow the language to interface with the outside, as such:

io = import “core.io”; // import the IO module

io.print(“hello world!”); // here ‘print’ is a host function written in C++

Maybe you see where this is going. If I implement method calls as JavaScript does, then every function, including host functions, need to take a hidden this argument, even if they don’t need it. I can work around this by making a special case where host functions called as methods behave differently, but then I potentially have to add an extra dynamic check at every function call. This gets annoying. I can work around this issue by wrapping my C++ host functions into plush functions which handle the unneeded this argument.

The second issue I ran into involves the way the this argument is handled. JavaScript has this problem where if you declare a closure inside a method, you can’t directly access the parent function’s this argument, as described in this blog post. This annoying quirk of the language was “solved” by the introduction of Function.prototype.bind. The JS workaround seemed like an ugly kludge to me, and so I thought “why not just do it like Python”? Why not make the this argument explicit. Make programmers declare and name the this value, in the same way that Python forces you to declare the self argument in methods.

Upon first examination, making the programmer explicitly declare the this argument seems like a fairy good idea. However, it does have the annoying consequence that an argument that was previously hidden is now a positional argument. Consider the following scenario:

// An IO module is imported, this produces an object containing IO functions
io = import “core.io”;

// The print method is “extracted”, so that we can call it with a shorthand name
print = io.print;

// This works fine. It passes a hidden this argument value to the print method
io.print(2);

// ERROR: we are not passing a this argument value
print(2);

In the case of modules, it’s clear that the print function shouldn’t even need a this argument value. I pondered this problem, and had the idea that possibly, method calls could have a different syntax from property accesses. The “arrow syntax” would make explicit the passing of the this argument:

// Call of the method m of object o
// The “arrow syntax” is specifically for method calls, and passes a this argument
o->m(2);

// Equivalent function call, passing an explicit this argument
o.f(o, 2);

io = import “core.io”;

// Regular function call, print does not take a this argument, it is not a method
io.print(2);

// This works correctly, as one might expect
print = io.print;
print(2);

The solution I came up with is not perfect, it raises some potential problems. For one, with a special arrow syntax for method calls, it becomes possible to call object properties using both regular function calls, and arrow style method calls which pass a this argument. Experience tells me that if both styles are possible, people are going to use both, which could be messy. For example, what do you do if you have a set of methods which do not need a this argument? Do you declare one anyway? Would you end up with a mix of regular function calls and method calls on the same object?

So, I’ve been using gedit this whole time

This is my coming out. I’m 31. I’ve been programming since I was 16, but all this time, I’ve never learned how to use a “proper” text editor.

It’s probably because as a teenager, I grew up with Windows 98, and on that platform, back then, command-line tools were very much second-class citizens. In university, I was introduced to Linux and somehow, over the years, I became hooked. I spent a total of 11 years in university, and over that course, I probably wrote over 400K lines of code. Most of it on Linux, and most of it using a text editor called gedit.

It’s the GNOME text editor. Its noteworthy features are syntax highlighting, and the ability to have multiple tabs open at once. It’s a fairly shitty text editor. It will freeze up if you try to open a text file more than a few hundred kilobytes, or with lines that are too long for it to properly digest. If you were to think of emacs as a sushi meal prepared by an experienced chef, then you could think of gedit as a microwaved bowl of ramen with a half cup of sriracha sauce dumped on it. I had the ramen right here at home, no cash on hand, and well, I was hungry.

I think the problem is, in university, there was never any class that was really about teaching us how to use properly and efficiently use tools. I learned about programming languages, operating systems, compiler design, and all those wonderful things, but all this time, I was never given the opportunity to sit down and think about the tools I was using. The craftsmanship of programming, so to speak. I started using gedit because it was the default on the Linux distro my university ran at the time. I stuck with it because of habit.

During my undergrad, we were given assignments with tight deadlines. We worked our asses off. Some semesters, I remember having as much as 40 hours of homework per week. Needless to say, there was a lot of pressure. Pressure to get things done quickly Given tight assignment deadlines, I didn’t really feel like spending 10 or 20 hours familiarizing myself with vi or emacs, or dealing with people who would tell me to go RTFM (Read the Fucking Manual). I went with gedit because it was intuitive and comfortable, as shitty as it was.

At my current workplace, we run MacOS, and well, while there is a port of gedit (not even kidding), it’s a fairly miserable experience. The MacOS port of gedit is like one of those freak creatures in a horror movie that begs you to pkill -9 it. Not knowing where to run, I started using the GitHub atom editor. It’s alright, but like gedit, it has its annoying quirks.

I’m 31, and I can’t help but want to take a step back. I don’t know how to use vim, but I think I’d like to learn. I can appreciate that a lot of thought was put into its design. It’s not a gimmicky gadget, it’s a powerful tool: a programmer’s text editor. I’m trying to convince myself that the time investment is worth it. At the very least, I can appreciate that vim is much more cross-platform, and stable across time, than any other text editor I’ve ever put up with.

My Experience with the ESP8266 – Making an LED Strip I can Control from my Shell

IoT is one of those buzzwords that’s been thrown around so much that it’s become largely synonymous with disappointing marketing hype. Still, home automation, in principle at least, has a lot of potential. The ESP8266 chip came out about two years ago, and it’s been drawing a lot of interest from the “maker” community. This inexpensive chip, for those who don’t know, incorporates a wifi module, flash memory, and a small 32-bit Xtensa core in a tiny package. This has had many people excited, because it means all kinds of electronic projects can be connected to wifi networks for only a few dollars in hardware cost.

I’ve known about the ESP8266 for a while, but until now, it wasn’t so interesting. Early versions of the chip had only a handful of I/O pins. You also needed to install some clunky SDK provided by the vendor to program it, and the instructions were not straightforward. Think tons of dependencies and lots of boilerplate code. Thankfully, this isn’t the case anymore. It’s now possible to get a NodeMCU module on eBay for less than $3.50 shipped, or $7 on Amazon. This module has many I/O pins, its own 3.3V voltage regulator, its own USB interface for programming, and best of all, is programmable with the Arduino IDE.

I’ve recently completed two projects with the ESP8266. One of them is a wifi-enabled power outlet that can be remotely switched on and off. The other is an RGB LED strip whose color can be changed remotely. I work with a Linux machine at home, and needed to update my udev rules in order for the USB interface of my NodeMCU modules to be recognized so that I could program them. Besides that, the whole process has been almost seamless.

holderThe components I used to interface a NodeMCU board with a 12V RGB LED strip

There is a simple “Hello World!” web server example that comes with the Arduino core package, and this example is less than 100 lines long. Making a wifi-enabled LED strip, on the software side, was a simple matter of parsing web request arguments for red, green and blue color values, and translating these into PWM intensity values to control output pins. Hardware-wise, I connected three 2N2222 transistors to the D0-D2 output pins, which are able to handle the current required to drive my one meter LED strip. The real beauty of it though, is that I can control the LED strip with shell commands, by issuing HTTP requests, which opens up the realm of scripting home automation:

# Make it red!
wget -O- 192.168.0.56/?r=255&g=0&b=0

2016-09-24-19-22-34

I intend to play with more of these in the future. There are now LED strips with individually-addressable LEDs, which seems like a lot of fun. I would also like to make a box with temperature, light and motion sensors that I can query remotely. If you’re interested in working with the ESP8266, I personally recommend buying a NodeMCU instead of a bare module. It will make your life much, much simpler.

Silicon Valley, Six Months Later

Maybe you’ve been wondering why I haven’t posted on this blog in over four months. The main reason is that my energy has been going elsewhere. In February, I defended my PhD thesis and relocated to the South SF Bay area (aka Silicon Valley) to work in tech. Most of my energy, since then, has been spent adapting to a new job and settling into a new environment. To be completely honest, it’s been difficult. I defended my thesis, prepared for an international move and started a new job all within one month. Hindsight is 20/20: if I had to do it over, I would take two or three months off after completing my degree to give myself a buffer to complete this transition.

Six months later, I have mixed feelings. I’m not sure how I feel about Silicon Valley. I picked an apartment in the South SF Bay because I wanted to be close to work. I’m able to bike to work, which is nice. However, as someone who grew up and lived all her life in a big city, the area feels very suburban and well, boring, to me. In all likelihood, part of the problem is that I don’t have many friends here. Once the work day is over, I often feel very lonely. It might sound cheesy, but this is probably what I regret most about leaving my hometown: leaving my friends and family behind.

I try to look at the bright side: I’m happy with my job. I like my boss and my coworkers. People here work reasonable hours and the work is challenging. I have my own place for the first time in my life. It’s a big apartment in a quiet neighborhood, it’s a comfortable place to live. Silicon Valley is a huge tech hub. It’s a good opportunity for me to learn a lot. Because I don’t go out much these days, I’ve actually made a decent amount of progress on my programming language project in my spare time. I also installed an electric motor on my bike and completed some electronic projects. These things give me some satisfaction. Still, loneliness is hard, and I don’t quite feel at home here yet.

 

Optimizing Ray Marching through Partial Evaluation

I love ray tracing as a 3D rendering technique. I love it because in its purest form, it’s very elegant and beautiful. It’s a simulation of light and photons flowing through an environment. Because of its physically-based nature, ray tracing can produce very realistic images from very simple code. You can render impressive scenes with just a few hundred lines of C. This is in stark contrast with state of the art polygon rasterization techniques used in modern 3D engines, which make use of a huge amount of very complex hacks to get something that tries to look somewhat realistic.

Traditional Whitted-style ray tracing is done with intersection tests, that is, the scene is rendered by testing if rays coming out of the camera and going into the scene intersect with various objects. With this style of rendering, in order to be able to render a 3D object, you need to be able to write a piece of code that checks whether or not a given ray intersects with the object, and if so, where the nearest intersection occurs (there might be multiple). This is relatively straightforward if you want to render something like a sphere, but it quickly gets more complicated when rendering more complex parametric surfaces.

Recently, I fell in love when I discovered a ray tracing technique that in many ways is even simpler yet more powerful than classical ray tracing. The technique, known as ray marching, or sphere tracing, makes it possible to render a scene that is directly defined by a Signed Distance Function (SDF). This is a function that, given a point in space, returns the closest distance to any point that is part of the scene. The distance is positive if the point is outside of scene objects, zero at the boundary, and negative inside scene objects.

The beauty of this is that SDFs are extremely simple to define. For instance, the signed distance from a point to a sphere is simply the distance between the point and the center of the sphere, minus the sphere’s radius. Better yet, SDFs can be combined together in very simple ways as well. You can produce the union of two objects by computing the minimum of both of their SDFs, and the intersection by computing the maximum. You can also do funky things such as infinitely repeating objects using modulo operators. SDFs truly have a lot of expressive power due to their combinatorial nature.

The demoscene group Mercury has produced some amazing demos of 3D animations based on the rendering of SDFs on the GPU. These demos are entirely procedural and fit in a tiny 64 kilobytes! The Timeless demo, in particular, showcases the kinds of spatial transformations and deformations of space that can be done when your scene is entirely defined by a mathematical function, as opposed to polygons. There are many interesting SDF code examples on ShaderToy you’re interested in playing with them.

Hopefully, by this point, I’ve convinced you that SDFs and ray marching are insanely cool. It’s also pretty amazing that modern GPUs are now fast and flexible enough that you can render fairly interesting SDFs in real-time. Unfortunately, SDFs remain expensive enough to render that it’s still tricky to render complex scenes. I think it would be great to live in a world where SDFs and real-time ray tracing could replace polygons, but it seems we aren’t quite there yet.

I spent some time thinking about SDFs, and it got me thinking: there’s got to be a way to make these faster. If you think about it, rendering an SDF is costly because there’s quite a bit of computations going on at every pixel, many evaluations of a potentially complex SDF. This computational work, however, is highly redundant. Do you really need to evaluate the entire SDF for every single pixel? Ray marching is very much a brute force technique. There just has to be a more efficient way to go about this.

SDFs are mathematical functions, pieces of code that get repeatedly evaluated. I come from a compiler background, and so I started wondering if, potentially, compiler optimizations could be applied to these SDFs. What if, for instance, we could apply partial evaluation to optimize the said distance functions and make them run faster? I did some research, and it turns out, unsurprisingly, that I wasn’t the first one to think of such a concept. There has already been work in applying partial evaluation to ray tracing and applying partial evaluation to the optimization of shaders.

The work that’s been done on applying partial evaluation to ray tracing, in my opinion, doesn’t go far enough. The authors have essentially partially evaluated the ray tracing algorithm and specialized it in function of the objects present in a given 3D scene. What I think would be very useful, is to specialize the evaluation of SDFs in function of the camera position. What if, for instance, you could mathematically prove that certain objects in your scene are not present in the left half of the view frustum. That is, what if you could prove that most of the objects in your scene are not visible on the left half of the screen?

It seems like it would be relatively straightforward to recursively subdivide the image and view frustum into multiple quadrants and determine which objects will definitely not be visible in each. I know it can be done, in fact, directly by using the definition of SDFs, without very fancy tricks. If you could do this, then you could generate a separate, optimized SDF for smaller fractions of the image to be rendered. What I’m essentially proposing is to generate optimized pieces of shader code on the fly for smaller areas of the screen, which can be evaluated much faster than the SDF for an entire scene.

I don’t know how viable it is to compile and run fresh shader code every time the camera moves on a GPU, but I believe it might actually be possible, using this kind of optimization, to render SDFs on a multicore desktop CPU at decent frame rates.