Sometimes a datum is just a datum

July 22nd, 2013

The Smalltalk and SELF programming languages are famous, among other things, for pushing forward the idea that "everything is an object". That is, every value in the language behaves as if it were an object in the Object Oriented Programming (OOP) sense. All values (integers, strings, arrays and functions included) have their own identity and methods and are potentially part of some inheritance hierarchy. The addition operator, for instance, is a method defined on integer and string objects. More recent languages such as Python, Ruby and JavaScript have attempted to replicate this idea with varying degrees of implementation quality.

I believe the goals of the original idea are interesting ones:

The way you manipulate and interact with all values should be as uniform as possible.
The behavior of every operator and method should be redefinable at run time.

One implication of this is that in a language with a proper implementation of these concepts, you don't need to have both typeof and instanceof operators, you can use the instanceof operators to query the type of any value (e.g. 5 instanceof Number would produce the true boolean value). Another useful consequence is that you would be able to redefine operators such as addition or multiplication on existing types to extend the behavior of the language. For example, you would be able to create your own numerical matrix class, and add support for multiplication between scalar numbers and matrices to the language.

These ideas are seductive, but at the same time, one wonders how desirable they are. The fact remains that values such as integers and floating-point numbers will not be represented in the same way as objects internally: this is an illusion we try to maintain for the programmer. Implementing core language operators as methods means we're adding an additional layer of indirection which a compiler will have to work to optimize away, and may not be able to optimize away fully. Then, there's the question as to whether allowing basic operators to be redefined is even a good idea. Does it make for good programming style to allow any part of your codebase to redefine fundamental operators everything depends on?

I like to think about language design issues such as these from both a usability standpoint and a compiler design perspective. As a compiler designer, I believe that optimizing away the overhead involved in everything being presented as an object adds behind-the-scenes complexity and potential overhead to a system, but is entirely feasible with modern optimization techniques. On the other hand, I'm really not sure whether the implementation costs are worthwhile, especially considering that hypothetical benefits to programmers might not really materialize if we're actually encouraging coding styles that increase entanglement and lead to reduced maintainability in the long run. I'd be curious to hear the perspective of previous or current Smalltalk/SELF users, if any of you happen to read this post.