Skip to content

What Killed Smalltalk?

I’ve been thinking about designing my own programming language for a long time. I’ve actually been keeping a lot of notes, and even throwing together some code when I can find time. My plan is to build something that takes inspiration from LISP, JavaScript and Smalltalk. I think there’s a niche to be filled. There are many new programming languages coming out lately, but most of them are statically typed and compiled ahead of time. The dynamic languages that do come out often don’t perform very well (see CPython, Ruby) or have poorly thought out semantics.

I’ve written a few blog posts pointing and laughing at perceived failures of JavaScript, but the truth is that programming language design is hard. There’s no limit to the complexity of the things you can build with programming code. No matter the design choices you make, no matter the language you design, there are bound to be some inconsistencies and weaknesses somewhere. I think that Smalltalk is a very inspiring programming language, revolutionary in many ways, but it’s also one that has gone extinct. It’s interesting, in my opinion, to ask ourselves why Python thrives but Smalltalk died.

Like LISP, Smalltalk implemented some interesting features which have influenced other languages (such as Java and JavaScript). Some nifty features of Smalltalk are really cool, but still aren’t implemented in any other language. For instance, Smalltalk had the ability to suspend running programs into a saved image, and resume execution later at the saved point. As you can imagine, this is extremely powerful and useful. Forget saving documents or saving games, just save the state of an entire program, no implementation effort required.

I found an interesting talk on YouTube titled “What Killed Smalltalk could Kill Ruby Too”:

Robert Martin makes the case that one of the big weaknesses of Smalltalk is that it was just “too easy to make a mess”. Smalltalk was highly dynamic, and encouraged people to “monkey patch” things and do quick fixes/hacks. He also makes the point that Smalltalk just “didn’t play well with others”. When you think about it, Smalltalk had its own source control, IDE and GUI built into live images, living alongside your program. Smalltalk isn’t just a language, it’s an operating system and a way of life. It’s conflating things that would maybe be best left separate.

It seems to me that in some key areas, the Smalltalk creators placed their own radical ideas above everything else. They chose idealism over pragmatism. Smalltalk was a language created with a grandiose vision. It had some deeply rooted principles which didn’t necessarily work so well in practice, such as the idea that everything had to be an object, that the object metaphor should be applied everywhere, one size fits all. At the end of the day, programmers want to get things done and be productive. If the language design or implementation gets in the way of getting things done, people will leave. Pragmatism is key for a programming language to succeed.

Smalltalk was also designed with the idea that it should be easy to learn and intuitive. This has led its creators to have a heavy focus on graphical user interfaces. I watched an introduction to Self on YouTube (Self is a direct descendent of Smalltalk) and saw the heavy emphasis on interacting with objects through UIs. The user interfaces showcased in this video are, in my opinion, horribly complex and unintuitive. Pretty much all of the interactions done through the UI would have been simpler and easier to understand if they had been done by writing one or two lines of code instead!

When you sit down and think about it for one second, you have to realize that programming doesn’t fundamentally have anything to do with graphical user interfaces. Yes, you can use programming code to create GUIs, but there is no reason that programming should have to involve GUIs and be tied to them. The metaphor of writing code has been extremely successful since the very beginning, and it probably makes more sense to the mathematical mind of a skilled programmer. Not everything has to have a visual metaphor. This is again a case of pushing some idealistic principle too far, in my opinion.

I believe that a lack of pragmatism is something that has killed many languages. Not just Smalltalk, but Scheme too. My first experience with Scheme involved trying and failing to install multiple Scheme distributions because I couldn’t get all the dependencies to work. Then, finally getting a Scheme compiler installed, and struggling to implement simple routines to parse text files, because Scheme doesn’t include the most basic string routines. The Scheme compiler I’d selected bragged that the code it produced was highly optimized, but once I finally managed to write my own string routines, I compiled my program, ran it, and it was dog slow. Parsing a one-megabyte CSV spreadsheet took over a minute. I ended up rewriting the code in Python. Why don’t more people code in Scheme? Because they try to realize their ideas in Scheme, and it just doesn’t quite work out.

JavaScript is the C++ of the Web


When I started my PhD, back in 2009, I told my advisor I wanted to work on optimizing dynamic programming languages. A big part of my thesis was going to involve the implementation of a JIT compiler for some dynamic language, and so our discussion rapidly became focused on which language I should be working with. In the end, we ended up choosing JavaScript. It was a good compromise: a widely-used “real-world” programming language, warts and all, that was still small enough for one person to realistically implement a compiler for. The ECMAScript 5 specification was around 250 pages long, and I read the whole thing from cover to cover before I began working on Higgs.

Since then, I feel I’ve been watching JavaScript go the way of C++, it’s becoming a “kitchen sink” language. So many new features have been added that the new ES6 specification document is literally twice the length of the ES5 specification. Worse yet, a year before the ES6 specification was even completed, there was already a laundry list of features scheduled for integration into ES7. They weren’t nearly finished with ES6, and they were already planning ES7. There are a number of semantic inconsistencies in JavaScript that need fixing, but the ES6 and ES7 additions do nothing to fix those, they merely add new features (read: complexity) to the language.

Personally, I’m a big fan of simplicity and minimalism in programming language design. I think that smaller languages have the potential to be easier to implement, optimize, teach, debug and understand. The bigger your language, the more semantic warts will pop out and the more behavioral inconsistencies are going to occur between different VM implementations. If JavaScript is really “the assembly language of the web”, then why does it need all these high-level features? The logical thing to do would have been to freeze as much of the JS semantics as possible, and focus on improving support for JS as a compiler target. I believe that the answer as to why JS keeps growing is largely design by committee.

Of course I’m biased. I implemented my own JavaScript JIT compiler and the fact is, I’m too busy to keep up with all these new additions. Still, it seems to me that in the web world, nobody takes the time to pause, breathe and think things out for even a moment. Case in point: Mozilla made a lot of noise with asm.js, a standard for compiling native code to JS that was allegedly better than Google’s Native Client. I think asm.js is still new enough that developers haven’t really had any time to adopt it, it’s only been used in tech demos, but Mozilla and Google are already working on WebAssembly, which in all likelyhood will make asm.js irrelevant. Think about that for a second: asm.js, which is still very new (2013, it’s only two years old), is already mostly irrelevant, before anyone even had time to adopt it.

WebAssembly is essentially what Brendan Eich told us we didn’t really want or need: a bytecode format for the web. A somewhat more neutral platform for all compilers to target. As a compiler implementer, it still seems to me like it’s a bit of an unfortunate compromise: a way to retrofit a web-bytecode into JavaScript VMs. It’s going to take programs encoded as Abstract Syntax Trees (ASTs) as input, whereas GCC, clang, and other real-world compilers usually generate Control Flow Graphs (CFGs) at the output stage, not ASTs. Forcing compilers to convert CFGs back into ASTs seems like a decision made to simplify the job of WebAssembly VM implementers, at the expense of everyone else.

All Possible Thoughts


I’ve recently been thinking about the topic of originality. You’ll often hear people say that “it’s all been done before” and “what’s old is new again”. The world population has recently passed the 7 billion mark. According to some estimates, there have been up to 120 billion human beings alive since the dawn of humanity. In a world so big, it’s hard to believe you’re unique. It’s easy to feel irrelevant and worthless. Some philosophers have even tried to make the argument that all possible thoughts have been thought of before, leaving you no chance of ever coming up with anything original. After all, human beings have existed for hundreds of thousands of years, and if there’s been 120 billion of us so far, there’s been a lot of thinking going on.

I think the best way to answer this question is with a thought experiment. We don’t know enough about neuroscience to exactly define what a “thought” constitutes. I’ll make some simplifying assumptions to give us some chance to grasp at this problem.

Let’s imagine that:

  • Thoughts are patterns of neural firings in a small cluster of 512 neurons in your brain.
  • Every human being has this same neural cluster.
  • The wiring of the thought cluster entirely fixed, and identical in every individual, not affected by environment or genetics.
  • Neurons in the thought cluster fire in a synchronized manner, 1000 times per second

In this imagined view, each thought is representable by a boolean vector of 512 bits, and any brain can have up to 1000 thoughts per second. In our imagined, simplified world, there are (2^512) ~= 1.34×10^154 possible thoughts in total.

Using some back of the envelope math, assuming there have been 120 billion human beings alive so far, each living for 100 years, each having up to 1000 possible thoughts per second, this gives us:

1000 * (365 * 24 * 60 * 60) * 100 ~= 3.2 * 10^12 thoughts per human being over a 100 year lifespan.

Hence (120 * 10^9) * (3.2 * 10^12) = 3.84×10^23 thoughts happened so far, out of 1.34×10^154 possible thoughts.

You might be wondering what the point of this was. My example is obviously ridiculous. Human thoughts likely are not patterns of firings in a cluster of 512 neurons. We have tens of billions of neurons in our brains, each with thousands of synapses, and our neurons do not fire according to a synchronous clock like a modern silicon chip. Furthermore, each brain’s connectivity is uniquely affected by a combination of both environment and genetics, and hence, no two people have exactly the same neurons and synapses in the same place.

The point is that the estimate of 1.34×10^154 possible thoughts is probably off by one hundred orders of magnitude. However, the estimate of 3.2 * 10^10 thoughts per year per human being may actually be generous. Hence, I surmise that not every possible thought has been thought. Far from it. The universe will likely dissipate before that has any chance of happening.

Feels like Censorship


I just got informed that my second paper on basic block versioning, an extension of my previous work, has been rejected. Most academics don’t really talk about these things. You probably shouldn’t publicly say that your paper has been rejected, because you want to project some kind of image of never-ending flawless success. The calculated, business-like, aseptic thing to do is to keep quiet, rework your paper, submit it somewhere else, rinse and repeat.

I’m talking about it. I need to let out some steam, express my frustrations a little bit. If that’s a bad career move, well, so be it. I don’t want to spend my life hiding behind a façade, pretending I’m perfect and always cheerful. Living life without ever expressing yourself is a fast path to depression, if you ask me. At the moment, I’m both frustrated and sad. I’ve spent months working on this paper. It was a good paper. Somehow though, it wasn’t good enough. It didn’t make the cut. Better luck next time. Call me cynical, but it is a little depressing considering this conference has an acceptance rate of about 45%. Damn.

I’ve worked with a conference’s program committee before. I’ve had to evaluate a paper about a programming language that consisted of a hand-written AST encoded in XML, with no tool support. I think the paper was 8 pages long. They were pitching this as a revolutionary new idea. This was back in the day of the XML-all-the-things craze. Are you telling me that my latest submission is in the same category as the XML one? I guess when it comes to computer science conferences, you’re either a zero or a one. There is no middle ground. Your idea is either deserving of publication, or piped into /dev/null.

The perverse thing is that this constant stream of rejection discourages exploration. As an academic, you really want your papers to get accepted. Your funding and ultimately your academic career depend on it. I’ve already started to adapt the way that I work. When I started my PhD, I had no idea how the paper game was played. Now, when I have a new idea for my research, I have to ask myself: it this publishable? It’s really interesting, it has a lot of potential, but is it publishable?

To publish your idea, you should craft the smallest possible publishable unit. It needs to be sexy and trendy. It needs to be about JavaScript. It needs to reference as many recent papers as possible, and ideally, point in the same direction as those papers. Contradicting established wisdom is not smart. Suggesting alternatives to the established wisdom is not very smart either. You’re contradicting iron-clad, proven, mathematical facts, which means you are wrong.

The reason why conferences have limited acceptance rates dates back to the days when conference papers were published in these books called “proceedings” which were purchased, printed and shipped in the mail. You couldn’t accept every paper, it wasn’t physically or financially possible. Nowadays, it’s estimated the Google server farms have a total storage capacity rated in multiple exabytes. Conceivably, we could make all submissions to all conferences available on conference websites.

Why do so many computer science papers come without any source code? Because the current practices in our field discourage replication and encourage “massaging” of results. In the spirit of transparency, we could make all submissions available, along with all of the reviewer comments. Maybe we don’t want all papers to be on the same footing. Maybe your paper would get ranked into class A, B, C or D, maybe you’d get some score on a 5 or 10 point scale. Certainly, not everyone could realistically be invited to come and give a talk. Still, is there really a need to silently discard 50 to 90% of all submissions to a conference?

It feels like censorship. When a paper is rejected, it strongly discourages further exploration of that research avenue. You’re telling me that my idea doesn’t deserve to be seen. Worse, you’re giving my academic competitors a chance to beat me to the punch. Science is about proving and disproving things, but it’s also about playing with ideas. In the world of computer science conferences, there’s very little room for disproving anything, and even less room for playing with ideas. We don’t have time for that. The next conference deadline is coming up real soon, and we have funding applications to write. Peer reviews can become peer pressure, a civilized form of hazing.

Fortunately, my paper is already online on arXiv. It’s timestamped. It’s out there. I don’t know if I’ll have time to publish this paper at an academic conference before the end of my PhD, I’m being pressed to finish as soon as possible, and submit more papers. If it gets rejected one or two or three more times, it might never get into any conference. I can at least take some comfort in the idea that some of my research was published, and my latest work is out there. It might inspire someone to explore a similar research direction.

My personal opinion is that academic research in compilers is dying. It’s going to go the way of operating systems research. Why? Because there’s too much infrastructure to build. It takes too long. It’s just not practical to publish about. These days, the game-changing, innovative work in compilers is largely happening in the industry, and it’s being done by people who left academia.

Presented at ECOOP


This week I am in Prague, at the European Conference on Object Oriented Programming (ECOOP) to present my research on Basic Block Versioning. Getting to ECOOP was fairly stressful. I was flying overnight but can never manage to sleep on airplanes. Sleep-deprived, I had to run like mad in an attempt to make an impossible connection in Paris. The Charles de Gaulle airport is organized in a way that I had to wait for two shuttle buses and go through security twice. Fortunately, the Paris-Prague flight was slightly delayed, and I barely made the connection, but my checked luggage did not.

I presented my paper Wednesday afternoon. The talk went very smoothly and the audience questions were rather friendly. The paper is now available online from the publisher if you’re interested in reading it. I was very happy to see that my talk and all others were filmed. The video is not yet available, but I have uploaded the slides. In addition to giving a talk, I also presented a poster explaining the main aspects of my paper. I was pleasantly surprised when they informed me that I had won the distinguished poster award.

There are many interesting people here, including VM engineers from Mozilla and Google, Brendan Eich and Bjarne Stroustrup. I had the privilege of visiting touristic sites, sharing a meal and discussing VM design with Carl Friedrich Bolz (of PyPy fame) and Sam Tobin-Hochstadt. My main regret is that I’ve had a very difficult time adapting to the local time zone. I’m sleeping poorly at night and crashing every afternoon. This has resulted in me missing many interesting talks. I’m looking forward to the recorded videos being uploaded. The VM and language design talks from Curry On are of particular interest to me.

Typed Shapes Paper Submitted

I haven’t written much about my progress on Higgs recently. This is because I’ve been busy writing and submitting conference papers. My ECOOP paper about basic block versioning has been accepted, and I just submitted a new paper about typed object shapes to DLS 2015. In July, I’ll be presenting my research at ECOOP in Prague, and then I’ll begin work on a third paper about interprocedural basic block versioning, which will complete my PhD work.

My thesis advisor has suggested that I construct my thesis as a series of conference papers. I already have two papers published about my work with JavaScript. The paper I just submitted and the next one will also be included, for a total of four papers. If everything goes well, I should be submitting my thesis in December 2015 and defending it in February 2016. After nearly 6 years in the PhD program, I can finally see the light at the end of the tunnel!

The unfortunate thing, for me, is that coding and working on Higgs is the part of my research that I enjoy the most (what can I say, I like building systems), and I haven’t had time to do much coding recently. Fortunately, there are a few more improvements to Higgs that I’ll implement for my next paper. I’ll be improving the calling convention, which should increase the performance of function calls quite a bit. I also have plans to improve the performance of global variable accesses even further.

Maybe we ought to have Numerical Coprocessors?

The past decade has seen the rise of GPGPUs. We’re leveraging the tremendous computational power of graphics cards to accelerate computationally intensive applications such as machine learning, video compression and sorting. Unfortunately, GPGPU is somewhat slow to gain adoption. There are multiple issues involved, such as the need for special GPU/numerical programming languages, complex drivers, vendor-specific differences, and the overhead of having to shuffle data in and out of a separate memory hierarchy.

I was recently reading a blog post claiming that matrix multiplication (GEMM) is the most expensive operation in deep learning, taking up to 95% of the execution time. This got me thinking that maybe GPGPUs are simply not ideal for most applications. Maybe future CPUs should begin to include numerical coprocessors. Sure, we already have SIMD, but the way it’s implemented on x86 CPUs is awkward and relatively inefficient, forcing you to deal with multithreading, prefetching, and SIMD registers of small fixed sizes. Every few years, Intel adds support for new instructions with new SIMD register sizes, rendering your code outdated (yuck). To do SIMD well, you basically write or generate code specialized for a specific CPU model, and even then, it’s just not that fast.

I believe that the Cray 1, which came out in 1975, had the right idea. You write a small loop (kernel) and let the CPU handle the memory traffic and looping. What I’m thinking of is essentially a CPU core which is optimized to run parallel-for instructions on a reduced numerical instruction set. Imagine having a specialized CPU core that shares the same memory space as other CPU cores and can handle prefetching, parallelization and low-level optimizations according to its capabilities, without you needing to change your code. Imagine not needing a driver or threads to make use of this core. Imagine how fast matrix multiplication or cross product could be if you had native hardware support for it.


Get every new post delivered to your Inbox.

Join 3,650 other followers