Skip to content

Sometimes a datum is just a datum

July 22, 2013

The Smalltalk and SELF programming languages are famous, among other things, for pushing forward the idea that “everything is an object”. That is, every value in the language behaves as if it were an object in the Object Oriented Programming (OOP) sense. All values (integers, strings, arrays and functions included) have their own identity and methods and are potentially part of some inheritance hierarchy. The addition operator, for instance, is a method defined on integer and string objects. More recent languages such as Python, Ruby and JavaScript have attempted to replicate this idea with varying degrees of implementation quality.

I believe the goals of the original idea are interesting ones:

  1. The way you manipulate and interact with all values should be as uniform as possible.
  2. The behavior of every operator and method should be redefinable at run time.

One implication of this is that in a language with a proper implementation of these concepts, you don’t need to have both typeof and instanceof operators, you can use the instanceof operators to query the type of any value (e.g.: 5 instanceof Number would produce the true boolean value). Another useful consequence is that you would be able to redefine operators such as addition or multiplication on existing types to extend the behavior of the language. For example, you would be able to create your own numerical matrix class, and add support for multiplication between scalar numbers and matrices to the language.

These ideas are seductive, but at the same time, one wonders how desirable they are. The fact remains that values such as integers and floating-point numbers will not be represented in the same way as objects internally: this is an illusion we try to maintain for the programmer. Implementing core language operators as methods means we’re adding an additional layer of indirection which a compiler will have to work to optimize away, and may not be able to optimize away fully. Then, there’s the question as to whether allowing basic operators to be redefined is even a good idea. Does it make for good programming style to allow any part of your codebase to redefine fundamental operators everything depends on?

I like to think about language design issues such as these from both a usability standpoint and a compiler design perspective. As a compiler designer, I believe that optimizing away the overhead involved in everything being presented as an object adds behind-the-scenes complexity and potential overhead to a system, but is entirely feasible with modern optimization techniques. On the other hand, I’m really not sure whether the implementation costs are worthwhile, especially considering that hypothetical benefits to programmers might not really materialize if we’re actually encouraging coding styles that increase entanglement and lead to reduced maintainability in the long run. I’d be curious to hear the perspective of previous or current Smalltalk/SELF users, if any of you happen to read this post.

  1. Ben Reale permalink

    This is an interesting article that I’m assuming it would also be relevant to C# with regards to the every type as inheriting ‘object’+’trivial functionality’ (eg. concatenation with ‘string’) and C++ which encourages operator overloads.

    However my question to you, as someone who is experienced with compiler design is: When considering that most overloads and/or simple inheritance is rather trivial isn’t there room for inlining functions and then optimizing said inline?

    Also in high level languages such as C# where it is expected for the simple data types to inherit ‘object’+’functionality’ wouldn’t it be prudent to have said object only contain virtual references to a preloaded set of functions which can then be call as needed?

    • This discussion is actually mostly relevant to dynamic languages (late-bound). In JavaScript, for example, you could, from anywhere in your code, at any moment, redefine the array concatenation method:

      Array.prototype.concat = myOwnFunction;

      C#, Java and C++ are largely statically typed and early-bound, and you can’t do things like this. You would also be defining a new operator on a derived class, whereas in JS people could redefine the method or operator on the base class itself.

      Maybe that’s an important point though. Possibly, even in a dynamic language, key runtime methods and operators should be marked read-only to prevent nasty things.

      • Ben Reale permalink

        Ah yes I seem to have missed that this article was aimed at at dynamic languages, that makes sense. My personal opinion is that JS allows people to get away with so much as it is that making key methods read-only may only just delay the inevitable failure of poorly written code.

        However also I can see your point that optimizing arbitrarily redefined core operators will certainly involve some overhead, kind of like dynamically redefining core string ops in a dynamically loaded module just before a big parse… performance headaches for all.

  2. I have noticed a change in performance issues in the last 10+ years. Cache misses now dominate our performance issues. For example: replacing red-black tree maps with B-tree maps speed up our code 10%. When we profile our code, I see pointer dereferences dominate the time our code. Any language feature that causes cache misses like a method jump table should be minimized for performance reasons.
    An X86 processor can run 500 instructions in the time it takes to service one cache miss. A single threaded program where cache misses dominate can bring a system to it’s knees. On a i7-970 6-core 12-thread, I have trouble play solitaire when cache misses dominate a single thread.
    The program I work on is Enventive a MCAE program most known for tolerance analysis. Basically CAD like, we pushing it into 3D.
    Demo of it:

  3. It is possible to go in the opposite direction: add extra metadata to the object besides the container that connects the object-ness with the underlying data. In the case of Python, this object-ness is the reference count and type. For example, the set of permissible operations that can be applied to the underlying data referenced can be attached, so that every first-class value appears to the programmer in a way akin to files in an operating system, with modes that limit the permissible operations. I’ve made the argument that this paradigm enables a new form of collaboration:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: