I’ve recently been thinking about the topic of originality. You’ll often hear people say that “it’s all been done before” and “what’s old is new again”. The world population has recently passed the 7 billion mark. According to some estimates, there have been up to 120 billion human beings alive since the dawn of humanity. In a world so big, it’s hard to believe you’re unique. It’s easy to feel irrelevant and worthless. Some philosophers have even tried to make the argument that all possible thoughts have been thought of before, leaving you no chance of ever coming up with anything original. After all, human beings have existed for hundreds of thousands of years, and if there’s been 120 billion of us so far, there’s been a lot of thinking going on.
I think the best way to answer this question is with a thought experiment. We don’t know enough about neuroscience to exactly define what a “thought” constitutes. I’ll make some simplifying assumptions to give us some chance to grasp at this problem.
Let’s imagine that:
- Thoughts are patterns of neural firings in a small cluster of 512 neurons in your brain.
- Every human being has this same neural cluster.
- The wiring of the thought cluster entirely fixed, and identical in every individual, not affected by environment or genetics.
- Neurons in the thought cluster fire in a synchronized manner, 1000 times per second
In this imagined view, each thought is representable by a boolean vector of 512 bits, and any brain can have up to 1000 thoughts per second. In our imagined, simplified world, there are (2^512) ~= 1.34×10^154 possible thoughts in total.
Using some back of the envelope math, assuming there have been 120 billion human beings alive so far, each living for 100 years, each having up to 1000 possible thoughts per second, this gives us:
1000 * (365 * 24 * 60 * 60) * 100 ~= 3.2 * 10^12 thoughts per human being over a 100 year lifespan.
Hence (120 * 10^9) * (3.2 * 10^12) = 3.84×10^23 thoughts happened so far, out of 1.34×10^154 possible thoughts.
You might be wondering what the point of this was. My example is obviously ridiculous. Human thoughts likely are not patterns of firings in a cluster of 512 neurons. We have tens of billions of neurons in our brains, each with thousands of synapses, and our neurons do not fire according to a synchronous clock like a modern silicon chip. Furthermore, each brain’s connectivity is uniquely affected by a combination of both environment and genetics, and hence, no two people have exactly the same neurons and synapses in the same place.
The point is that the estimate of 1.34×10^154 possible thoughts is probably off by one hundred orders of magnitude. However, the estimate of 3.2 * 10^10 thoughts per year per human being may actually be generous. Hence, I surmise that not every possible thought has been thought. Far from it. The universe will likely dissipate before that has any chance of happening.
I just got informed that my second paper on basic block versioning, an extension of my previous work, has been rejected. Most academics don’t really talk about these things. You probably shouldn’t publicly say that your paper has been rejected, because you want to project some kind of image of never-ending flawless success. The calculated, business-like, aseptic thing to do is to keep quiet, rework your paper, submit it somewhere else, rinse and repeat.
I’m talking about it. I need to let out some steam, express my frustrations a little bit. If that’s a bad career move, well, so be it. I don’t want to spend my life hiding behind a façade, pretending I’m perfect and always cheerful. Living life without ever expressing yourself is a fast path to depression, if you ask me. At the moment, I’m both frustrated and sad. I’ve spent months working on this paper. It was a good paper. Somehow though, it wasn’t good enough. It didn’t make the cut. Better luck next time. Call me cynical, but it is a little depressing considering this conference has an acceptance rate of about 45%. Damn.
I’ve worked with a conference’s program committee before. I’ve had to evaluate a paper about a programming language that consisted of a hand-written AST encoded in XML, with no tool support. I think the paper was 8 pages long. They were pitching this as a revolutionary new idea. This was back in the day of the XML-all-the-things craze. Are you telling me that my latest submission is in the same category as the XML one? I guess when it comes to computer science conferences, you’re either a zero or a one. There is no middle ground. Your idea is either deserving of publication, or piped into /dev/null.
The perverse thing is that this constant stream of rejection discourages exploration. As an academic, you really want your papers to get accepted. Your funding and ultimately your academic career depend on it. I’ve already started to adapt the way that I work. When I started my PhD, I had no idea how the paper game was played. Now, when I have a new idea for my research, I have to ask myself: it this publishable? It’s really interesting, it has a lot of potential, but is it publishable?
The reason why conferences have limited acceptance rates dates back to the days when conference papers were published in these books called “proceedings” which were purchased, printed and shipped in the mail. You couldn’t accept every paper, it wasn’t physically or financially possible. Nowadays, it’s estimated the Google server farms have a total storage capacity rated in multiple exabytes. Conceivably, we could make all submissions to all conferences available on conference websites.
Why do so many computer science papers come without any source code? Because the current practices in our field discourage replication and encourage “massaging” of results. In the spirit of transparency, we could make all submissions available, along with all of the reviewer comments. Maybe we don’t want all papers to be on the same footing. Maybe your paper would get ranked into class A, B, C or D, maybe you’d get some score on a 5 or 10 point scale. Certainly, not everyone could realistically be invited to come and give a talk. Still, is there really a need to silently discard 50 to 90% of all submissions to a conference?
It feels like censorship. When a paper is rejected, it strongly discourages further exploration of that research avenue. You’re telling me that my idea doesn’t deserve to be seen. Worse, you’re giving my academic competitors a chance to beat me to the punch. Science is about proving and disproving things, but it’s also about playing with ideas. In the world of computer science conferences, there’s very little room for disproving anything, and even less room for playing with ideas. We don’t have time for that. The next conference deadline is coming up real soon, and we have funding applications to write. Peer reviews can become peer pressure, a civilized form of hazing.
Fortunately, my paper is already online on arXiv. It’s timestamped. It’s out there. I don’t know if I’ll have time to publish this paper at an academic conference before the end of my PhD, I’m being pressed to finish as soon as possible, and submit more papers. If it gets rejected one or two or three more times, it might never get into any conference. I can at least take some comfort in the idea that some of my research was published, and my latest work is out there. It might inspire someone to explore a similar research direction.
My personal opinion is that academic research in compilers is dying. It’s going to go the way of operating systems research. Why? Because there’s too much infrastructure to build. It takes too long. It’s just not practical to publish about. These days, the game-changing, innovative work in compilers is largely happening in the industry, and it’s being done by people who left academia.
This week I am in Prague, at the European Conference on Object Oriented Programming (ECOOP) to present my research on Basic Block Versioning. Getting to ECOOP was fairly stressful. I was flying overnight but can never manage to sleep on airplanes. Sleep-deprived, I had to run like mad in an attempt to make an impossible connection in Paris. The Charles de Gaulle airport is organized in a way that I had to wait for two shuttle buses and go through security twice. Fortunately, the Paris-Prague flight was slightly delayed, and I barely made the connection, but my checked luggage did not.
I presented my paper Wednesday afternoon. The talk went very smoothly and the audience questions were rather friendly. The paper is now available online from the publisher if you’re interested in reading it. I was very happy to see that my talk and all others were filmed. The video is not yet available, but I have uploaded the slides. In addition to giving a talk, I also presented a poster explaining the main aspects of my paper. I was pleasantly surprised when they informed me that I had won the distinguished poster award.
There are many interesting people here, including VM engineers from Mozilla and Google, Brendan Eich and Bjarne Stroustrup. I had the privilege of visiting touristic sites, sharing a meal and discussing VM design with Carl Friedrich Bolz (of PyPy fame) and Sam Tobin-Hochstadt. My main regret is that I’ve had a very difficult time adapting to the local time zone. I’m sleeping poorly at night and crashing every afternoon. This has resulted in me missing many interesting talks. I’m looking forward to the recorded videos being uploaded. The VM and language design talks from Curry On are of particular interest to me.
I haven’t written much about my progress on Higgs recently. This is because I’ve been busy writing and submitting conference papers. My ECOOP paper about basic block versioning has been accepted, and I just submitted a new paper about typed object shapes to DLS 2015. In July, I’ll be presenting my research at ECOOP in Prague, and then I’ll begin work on a third paper about interprocedural basic block versioning, which will complete my PhD work.
The unfortunate thing, for me, is that coding and working on Higgs is the part of my research that I enjoy the most (what can I say, I like building systems), and I haven’t had time to do much coding recently. Fortunately, there are a few more improvements to Higgs that I’ll implement for my next paper. I’ll be improving the calling convention, which should increase the performance of function calls quite a bit. I also have plans to improve the performance of global variable accesses even further.
The past decade has seen the rise of GPGPUs. We’re leveraging the tremendous computational power of graphics cards to accelerate computationally intensive applications such as machine learning, video compression and sorting. Unfortunately, GPGPU is somewhat slow to gain adoption. There are multiple issues involved, such as the need for special GPU/numerical programming languages, complex drivers, vendor-specific differences, and the overhead of having to shuffle data in and out of a separate memory hierarchy.
I was recently reading a blog post claiming that matrix multiplication (GEMM) is the most expensive operation in deep learning, taking up to 95% of the execution time. This got me thinking that maybe GPGPUs are simply not ideal for most applications. Maybe future CPUs should begin to include numerical coprocessors. Sure, we already have SIMD, but the way it’s implemented on x86 CPUs is awkward and relatively inefficient, forcing you to deal with multithreading, prefetching, and SIMD registers of small fixed sizes. Every few years, Intel adds support for new instructions with new SIMD register sizes, rendering your code outdated (yuck). To do SIMD well, you basically write or generate code specialized for a specific CPU model, and even then, it’s just not that fast.
I believe that the Cray 1, which came out in 1975, had the right idea. You write a small loop (kernel) and let the CPU handle the memory traffic and looping. What I’m thinking of is essentially a CPU core which is optimized to run parallel-for instructions on a reduced numerical instruction set. Imagine having a specialized CPU core that shares the same memory space as other CPU cores and can handle prefetching, parallelization and low-level optimizations according to its capabilities, without you needing to change your code. Imagine not needing a driver or threads to make use of this core. Imagine how fast matrix multiplication or cross product could be if you had native hardware support for it.
It’s now been over 50 years since the birth of the AI field, and we still have no androids or sentient computers. Some believe it’s because Artificial General Intelligence (AGI) is simply unachievable, others say it’s because we just don’t have the computing capacity required yet, and then some think it’s because we’re lacking a fundamental breakthrough, we just haven’t grokked the “general-purpose intelligence algorithm”. The latter view is reflected in Jeff Hawkin’s On Intelligence book.
I’m currently reading Nick Bostrom’s Superintelligence book (which I highly recommend), and it’s got me thinking about intelligent machines, a topic that’s fascinated me since childhood. My own opinion is that strong AI is almost inevitable. Assuming that humanity doesn’t destroy itself or fall into a new middle age, sentient computers will happen, it’s only a matter of time. I’m of course hesitant to put a time scale on this prediction, since such predictions have so often been wrong, but I’m inclined to believe that it will happen within this century.
I personally don’t believe that computing capacity has ever been the problem. We’ve had impressively powerful supercomputers for decades, we just haven’t had very effective machine learning algorithms. The fact is that until recently, nobody knew how to train a neural network with 5 layers of depth short of brute forcing the problem with genetic algorithms, which is obviously not what human brains do. Now, with deep learning, we’re finally starting to see some interesting progress in machine learning, with systems outperforming human beings in image recognition tasks.
Still, I don’t believe that deep learning is the one breakthrough that will lead to strong AI. Deep learning is impressive and cool, but it isn’t strong AI, it’s simply much better machine learning than what we’ve had until now. I simply don’t think strong AI will come from one single breakthrough. We’re not going to suddenly grok the one algorithm for AGI. The people who believe this, in my opinion, fail to appreciate the complexity of the human brain. Our brains are made of a multitude of components specialized to perform different tasks effectively. Hence, it would be surprising if we could obtain human level intelligence from some one single algorithm.
The obsession with cracking the AGI algorithm, in my opinion, stems from the obsession some computer scientists have with mathematical elegance and the study of algorithms that fit within a single page of text. Algorithms short and simple enough that emergent mathematical properties can easily be proved or disproved. The human brain, however, is the result of an evolutionary process that occurred over hundreds of millions of years, and is too complex to be described by an algorithm that fits within a single page. The reason we haven’t created AGI, in my opinion, is that several component parts need to be put together in non-trivial ways. At this stage, machine learning, as a field, has largely been figuring out how these individual components can be built and optimized.
I think there will come a point where we begin to put different machine learning components, such as image classifiers, planners, databases, speech recognizers and theorem provers together into a coherent whole, and begin to see something resembling AGI. I don’t think, however, that some computer will suddenly reach sentience at the flick of a switch. Engineering is similar to natural selection in that it’s often a process of iterative refinement. It’s now possible to fly anywhere on the globe in a jet-propelled airplane, but the airplanes of today are very different from the Wright brother’s prototype. There’s no reason to think AI is any different.
The first AGIs probaby won’t be “superintelligent”. It’s likely that they will initially lag behind humans in several domains. They may show great intelligence in some areas while failing to grasp key concepts about the world which seem obvious to us. One can simply think of dogs. They’re sufficiently intelligent to be useful to us, but they can’t grasp everything humans do. Provided that we do eventually create AGI, these AIs will likely reach and surpass human intelligence, but this process is could take years or even decades of iterative refinement.
An interesting question, in my opinion, is whether the first AGIs will exist as agents on the internet, or whether they will come in the form of embodied robots. In the near future, I predict that we will see an increasing presence of intelligent agents online, the rise of the smart web. These agents will be used to automatically extract semantic information from web content, by doing things such as tagging videos or text with metainformation, and perhaps predicting the behavior of human individuals. This, however, is not AGI. There’s an argument to be made that robots may bring forth AGI simply because the need is there. We expect robots to effectively deal with the complexities of the real world as it is, which seems to require human-level intelligence.