Skip to content

Building a Small-Scale Prototype

February 4, 2016

I’ve blogged before about my plans to create a programming language of my own. I have a long list of features I’d like to experiment with. I want to incorporate features that are not commonly found in other languages. For instance, the language is going to have an extensible grammar. I’d like to have program images that can be suspended and resumed like in Smalltalk. I’m also trying to sketch out a JIT compiler that compiles itself. Unfortunately, so far, I’ve spent more time thinking about the design and implementation of this language than writing any code. I wrote a decent amount of code, then ended up scrapping it and starting over, only to think that I should maybe start over again.

Part of the problem, I think, is that I keep wanting to plan 20 steps ahead. My language, zeta, is going to have a few features which are fairly uncommon/experimental, such as the extensible grammar. I don’t know ahead of time precisely how these features are going to work out as part of the overall design, there are many small micro-decisions to make, things that could be implemented this way or that way, and that’s where I keep getting stuck. I keep worrying about issues such as performance, garbage collection, multithreading, and how different language features will integrate with one another. Inevitably, I end up having to admit that there are simply a lot of important questions I don’t yet have an answer to. Language design is hard, and so is compiler design.

I think that this is a common problem with ambitious engineering projects. When you’re trying to innovate (or just do things differently) on ten different fronts at the same time, there are simply too many unknowns, too many degrees of freedom. This is why innovation is typically more incremental than revolutionary. People build on trusted foundations, and maybe they add one or two cool new features on top. When you try to reinvent everything, there are just too many problems to be solved at once. Solving multiple problems simultaneously is problematic, because it gets in the way of building your product. You want to create something that’s globally optimal, as good as you can make it, but you keep having to make technical decisions which you simply don’t have enough information to make. Each of these decision points stalls you a little bit, because making the optimal technical decision requires taking the time to pause and gather more data. Worse yet, not all of these technical decisions are independent, some are simply incompatible with one another. Choices you make now will push you down a certain path later down the road. Sometimes, you make a certain set of choices, and later on you realize that you went down the wrong path, and you have to backtrack.

I keep wanting to do the ambitious thing, and getting into complex, elaborate designs as a result. I start building, and I often end up backtracking. I’ve decided to take a step back and ask myself some important questions, such as, why “Why do I even want to create a programming language?” and “What am I getting out of this?” One of the things I’ve come to realize is that creating a programming language as a hobby project for the simple joy of doing it, and creating a programming language which I hope will eventually become popular and dethrone python are two very different projects with very different requirements. Creating a programming language for fun is a project with very loose requirements and much freedom to explore, motivated by the joy of programming I discovered when I was a teenager (and hope to keep alive for a long time). On the other hand, creating a language in the hope of whacking Python in the gonads with a 2×4 is a project likely motivated by my inflated ego, and my failure to integrate mindfulness meditation into my daily routine.

Designing and implementing a popular programming language means I have to care about many issues such as:

  • How learnable/palatable the syntax is to the masses
  • How beginner-friendly the language is
  • Integration with C/C++ code
  • Deployment, how easy it is to install my compiler/VM
  • Being able to generate binary executables
  • Multi-platform support, including mobile devices
  • Unicode support and internationalization
  • The absolute performance of my language
  • Multithreading and parallel performance
  • Security of my implementation (exploits)
  • Responding to the wants and needs of the community

All of these issues, and more, are things that necessarily strongly influence the design of a language. They are things that limit my ability to freely explore language design. Most importantly, creating a language with popularity in mind means that this project starts to be very much about caring about the wants and needs of the community. Lots of limitations on my ability to pick the language design I want, and having to care (work for) a huge community means that the project starts to look a lot like a job rather than a fun hobby. Hence, if I want to explore language design, maybe I ought to go about this project differently, scale down my ambitions, and go for something less grandiose.

I’ve decided that I’m probably not ready to build a commercial/popular language. Instead, I’m building a small-scale prototype. I’ve asked myself “What’s the most important language feature I want to experiment with?” The answer is the extensible grammar. What I’m doing, then, is building a sort of Minimum Viable Product (MVP). A tiny minimalist language with just the features I need, so that I can get to the interesting part of experimenting with an extensible grammar as soon as possible. Right now, this language is interpreted only, and it has no garbage collector (it never deallocates). Even this, I’m finding out, is challenging. Even this tiny project involves solving many technical problems, but things are already much simpler, and most importantly, much more fun. Once I have the extensible grammar working well, I’ll move on to adding other features, and exploring other equally interesting technical problems.

10 Comments
  1. Have you watched Jonathan Blow’s programming language and compiler videos?
    He was facing much of the same issues you are. He is approaching the task from a different set of problems though.

  2. Andrew permalink

    Standards versus the ego.

    If I’m not mistaken, it took 4 years for N. Virt to design and implement Pascal in 60’s.
    I guess, even today, with all the available technologies, it is still of the same order of magnitude.
    Moreover, an attempt to make everything by self seems to be about from the ego than an ideals.
    After all, a language, even formal, is first of all a social tool for communication of knowledge, precisely, and less for interaction with the machine (all those keywords and names from the “popular” language etc.).
    So I don’t think it is even “noble” or just long-term efficient to try to implement everything yourself.

    Instead, I agree, it is much better idea to just prioritize the features, and implement them as the time permits.
    And reuse the existing technologies where possible, even despite they are often far from being perfect.

    Popularity and novelty.

    Also, I don’t think it is even any good idea to aim at popularity, to design something whose purpose is to be popular… If it will – cool, if will not – not bad too.
    Industrial standards
    (1) are not that broken, mostly fulfill their functions, and
    (2) are significantly independent from the research.

    So one should respect the standards where possible, or have a good reasons to not do so. Regardless of whether the interests are more in basic research or solution of a concrete problem.

    Candidates for the platform.

    Have you considered to just implement your ideas on top of an existing translator, developed as FOSS?
    Say, Common Lisp’s macros allow to implement any S-expressions-based syntax almost effortlessly, or literally any syntax to embed in the read-tables with a little more efforts.
    Or OCaml’s (G)ADT’s for a syntax trees, or Dependent Types, or HoTT, or CoIC…
    By the way, OCaml has an intermediate (bytecode) compilation feature (may be easier to play with),
    and a “compiler compiler” to experiment with the syntax (a la yacc/lex). And yet, it is really small and clean (unlikely to Haskell).

    But I guess you know all of this, since, if I recall it correctly, you wrote previously that you had some experience with both, OCaml and Lisp.
    So why not avoid reinvention of the wheel wherever possible?

  3. The classical problem between “perfect solution” and “getting things done”. Strategies that I follow in such a case are “design for change” and “one thing at a time”. The latter if possible totally decoupled from the rest.

    Bringing things together later, will have an impact on the single parts, but you much better understand what needs to be done.

    BTW: Because a lot of people blame management for sloppiness in technical decisions, good managers have a feeling where to move forward and don’t seek perfection. Because at the end of the day, what counts is what you deliver.

  4. dicebot permalink

    Looking at successful language technology (not actual languages but how they are implemented) key approach seems to be keeping separate things separate – hardly surprising, considering this a good approach in software development in general. Environment to suspend images shouldn’t care about language syntax and semantical frontend shouldn’t assume much about how it is used. Designing compiler around a collection of independent libraries with well-defined API both allows you to focus on one area at single point time and allows to reuse it for additional tooling later.

  5. I recommend Josh Bloch’s talk on API design:

    After all, what is a programming language but a user interface that hides the complexity of writing machine code by hand? I think the two most important points for your situation are:
    * Map out your features in small, informal use cases
    * Start by writing the code that calls the interface, not the code that implements it.

    In other words, figure out the kinds of programs that you’d like to make easy with your language, and then write those first. This makes it really easy to catch flaws, because you can change anything that doesn’t work on the fly. Then, once you have a good collection is detailed, consistent programs, you can infer the grammar, and write the compiler.

    He also discusses (if I remember right) getting the power-to-weight ratio right: that is, only include a big feature, if it lets you cover many important use cases. If not, leave it for later.

  6. Notation matters. For quite a long time the Romans labored using a numerical system that required people doing arithmetic to compute things like: MCXLVII times XXVIIII.

    The existing notations are clumsy, error-prone, and have very low productivity. You can spend hours trying to fix small error. This is all well and good for those paid by the hour to do so, but given the amount of software that needs to be written, there is an urgent need for a new notation for general purpose graphical programming.

    Interestingly, one of the great pioneers of computers, John Backus, in his turing aware lecture, talked about the different kinds of languages (procedural (fortran, algol, C, etc.), substitution (LISP, FORTH), and his new category (functional). Unfortunately Mr. Backus did not complete his functional programming work. He only got half way.

    There are breakthroughs around the corner. Deductive (also referred to as Declaractive) languages have tremendous power, and the potential for that 10:1 reduction that would be necessary to achieve adoption.

    I am a big fan of professor Wirth’s languages, but unfortunately, the bulk of the programming world did not go down his path, but chose other less elegant paths. The Java era is thankfully ending, and Javascript is now the dominant language. It however was never designed to be the graphical interactive language for the masses, so it suffers from very deep flaws.

    i would enjoy speaking with you further on the subject.

  7. I like the idea of a Minimal Viable Product and the language discussion in this thread very much. It reflects a lot of what I’ve been faced with, when I was thinking about a totally different VR related project of similar complexity.

    To start an MVP helped me a lot to better understand the details without loosing the big picture of my vision. So far it helped me to see its weakness and to think about better solutions to reach the goal.

    In my project, the programming language to choose plays an important role. I’ve not yet taken any decision, but as simplicity, independency and concurrency are keys of my vision, Go could be the perfect candidate, even its weakness is the graphics part.

    Except to test the impact of some details I have not yet written a single line of code, but that was not my intension anyway. My first goal is to define all of the needed pieces and how they fit together, so I can build a simple working model (hopefully) by the second half of this year.

    Today I know, I can never do the project myself. I hope a simple model allows me to better explain and demonstrate the concept to the crowd and hopefully get some interested VR enthusiasts and developers on board.

  8. William H. Mitchell permalink

    One of the things about language design that I learned from Ralph Griswold, who created the SNOBOL family and Icon, is that if you’re going to create a language, create a language that *you* want to use.

  9. Only recently ran across this post, and figured I might as well make a suggestion: if you decide to add execution images at some point, then I advise saving them specifically as NEW executables instead of modifying the original file. I spent a while thinking about persistent data (among several other topics; the subject of persistence being inspired by both LPC via DGD, and the Apple Newton), and ultimately decided that when network-file-systems are brought into consideration, the entire idea of writing the persisted data back to the original file breaks down due both to possible network unreliability, as well as the possibility of running the same executable on multiple machines simultaneously. The attempt can be made to cheat your way around this by having some mechanism on the file-server that arbitrates such attempts, but ultimately I concluded that the only place such a technique can reliably (and by implication should ever) be attempted is on the same machine running the executable in the first place, and that if executable-merging is desired, then it’s better to do it on the execution-side than on the storage-side.

Leave a comment