Skip to content

Designing a toy CPU

July 13, 2012

A Positive Response

I’m happy to see that my previous post, A Radical Introduction to Programming ended up generating quite a bit of discussion on r/coding. The reception was fairly warm overall. Most people seemed to think that the idea of teaching simplified assembly as a first programming language made sense. Those who disagreed seemed to mostly come from two camps: adepts of functional programming, and others who simply didn’t see the potential of assembly as a teaching platform, for a variety of reasons.

I would like to say that I’m not suggesting the teaching of assembly should replace teaching of high-level languages. What I would like to do with assembly, is give students a short 2-4 week introduction to computers and programming. Fundamental concepts such as integer arithmetic, branching, the run-time stack, recursive function calls and arrays would be exposed in a very concrete, hands-on manner. Ideally, students should then be able to move on to higher-level languages with more confidence, and perhaps even learn more advanced material at a faster pace.

The positive responses to my post have motivated me to go ahead and develop an online emulator for simplified assembly, and perhaps even build a series of tutorials around it (help with the writing would be appreciated). I don’t necessarily think that this system will be the perfect learning platform for everybody, some people might learn better with other teaching methods, but I certainly think it’s worth at least putting it out there and letting people try it. If just a few people learn something, I’ll be happy. I, for one, think I would have loved such a system when I was 9-10 and started getting interested in computers!

Some people have asked why I’d want to design my own assembly and emulator when there are other existing assembly options out there, some of which already have browser-based emulators. It has been suggested, for example, that I could instead use (from most to least practical):

I’ll go right ahead and confess that I do like the idea of rolling my own, and I feel very confident in my own ability to do so. This isn’t the only motivating factor, however. While it may not have as much SWAG as a physical computer, there is clearly a definite teaching advantage to having an in-browser emulator that can be accessed from anywhere at a cost of zero dollars and can’t be broken by accident. As for other assembly languages, I feel they generally have arbitrary restrictions that will hinder teaching. My goal is not really to simulate a retro computer, it’s to make the language as accessible as possible, and things like having only three usable 8-bit registers can get in the way.

Architectural Considerations

As is usually the case when I start getting excited about a project, I had a flood of ideas for my emulated system. I’ve been giving some thought to architectural considerations, trying to strike a balance between realism, simplicity and ease of use by novices. The following is largely a brain dump of my thoughts on the design of the system:

  1. The instruction set will have the standard arithmetic (add, sub, mul, div, mod) and bitwise (and, or, xor, not, shift) operations, as well as multiple conditional branching instructions. The goal is not to make the instruction set as small as possible. It is to make it somewhat realistic, very easy to grasp, while demonstrating important concepts.
  2. I’m thinking a 16-bit architecture might be best at this point. All registers would be 16-bit, and the memory space would contain 64K words (either 8 or 16-bit each). As someone pointed out, 16-bit registers can easily be displayed on one page, showing all the bits. A 64K memory space is small enough that one can easily visualize it and see the stack pointer move, while being large enough to allow useful programs to be written.
  3. There will most likely be 16 registers in total. 12 fully general-purpose registers, a flags register, the stack pointer, the instruction pointer, and another register currenty reserved for future uses. All registers will be usable in any operation that can take a register argument.
  4. The instruction encoding will be fairly naive. Variable length. One opcode byte, one operand format byte, extra bytes for operands (register indices or constants). The point is not to have a realistic ARM or x86-like encoding, but rather that the instructions have an encoding in memory, and that this encoding be easy to decode if desired.
  5. I’m unsure at this point if the RAM should be byte-addressable, or addressable only by 16-bit words. The latter has the advantage that we get twice the ram (128K!) and we can probably offer better visualization of values in the RAM, as they couldn’t be unaligned. The disadvantage is that real-world systems are usually byte-addressable (sort-of?).
  6. I was initially thinking of memory mapping (DMA) all the display memory. This is problematic because even at 256×192 and 8 bits per-pixel, that’s 48KBs, most of the RAM. I think I will instead rely on a virtual display adapter that responds to simple commands (draw a square, draw a circle, draw a line, draw a sprite) and has its own memory spaces for the frame buffer and sprites. This will allow me to “cheat”. RAM will be freed, people will be able to plot individual pixels if they so desire, but they will also be able to do fast rendering of decent-resolution graphics in 24bpp glory.
  7. I’m thinking the sound chip should probably be a simple 16-voice subtractive synthesizer, with the standard waveforms, an LFO and a filter, all able to follow individual envelopes. The possibility of playing raw audio data would be trivial to support as well.
  8. For some realism, virtual devices will be interacted with through a simulated parallel master-slave bus (the CPU being the master, all devices being slaves). Communication will be in the form of 16-bit words. Library routines will be provided to wrap all the gory (not that bad, actually) details and provide simple calls that satisfy most use cases.
  9. The library code will be browsable (read-only), and possibly indexed with a doxygen/javadoc-like system, allowing you to get a description of all available routines and even see their implementation if you so desire. The library should be fairly extensive and allow you to get started writing useful programs right away.
  10. I’m not sure if it’s best to grow the stack up or down. Is there a disadvantage to growing it upwards?
  11. I’ve been thinking of a name for the system. Right now I’m leaning towards toyCPU (short and sweet).
  12. I’ll look into possibly using Google AppEngine to store user-created programs online, and perhaps allow users to rate them, comment on them and fork their own version. This should help grow a user base.

Help Wanted

I already started working on the implementation. I decided to use the CodeMirror library to give the code a nice display with line numbers and syntax highlighting. I think I might start to have something working this weekend, but I probably won’t make it publicly available until it’s closer to complete. I figure it’s best to wait until the implementation is relatively stable and there is a mechanism to share your code online so that people don’t waste effort or lost interest before the system is ready. I may, however, be willing to share a link with a few people early (beta testing, anyone?).

If you’re interested in helping, I could use help at various stages:

  • Setting up an AppEngine platform for code sharing and community building.
  • Writing virtual devices, such as the graphics adapter, the sound adapter and the system clock.
  • Writing the standard library.
  • Writing cool demo programs (simple games, music, basic/scheme interpreter, demoscene, etc.)
  • Writing programming tutorials.
  • Beta testing.
  • Promoting the platform and tutorials.

If you’re interested in helping, please let me know. I will warn you that I’m somewhat of a code nazi, but I do my best to be friendly.

7 Comments
  1. I can find a close resemblance of the architecture you described with the Intel 8085. Although I do not have much experience of programming in different old architectures, but I can tell that Intel 8085 is a pretty simple one to grasp. Its instruction set is also pretty simple and easy to make an emulator. I made one, which required machine code to be entered after hand assembly, the design was very simple, though lengthy code to emulate each instructions. For starting assembly and get the feel of the underlying hardware I think it would be easier, as the architecture is easy to understand than modern CPUs therefore I think the Intel 8085 instruction set is worth to have a look.

  2. Kylie permalink

    “I will warn you that I am a code nazi” That’s the end of this project. Every software engineer HATES code nazis!!!!! Even code nazis hate each other because that can’t agree on coding styles.

    The software engineering field is a sellers market, Most software engineers have now trouble moving away from annoyances like code nazis. Management learns very quickly that when the code nazi is placed on their project that all the good engineers move on. Then they are left with the bag trying to explain to upper management why the project failed. There are only two situations where the programers have to listen to the code nazi. A) They are an indentured servant laboring under an H1-B Visa. b) They are an indentured servant laboring under a major professor in grad school.

    Rarely if any program is written by one person. Just about every program is written as a collaborative effort. Everybody needs to get along. Part of that is respecting the work of other team members. Lambasting former team members work is acceptable. But being a code nazi and criticizing the style of somebody else code get’s you no where. Any criticism needs to be universally viewed by other team members as constructive.

    -Kylie

    • I try to keep my criticism relevant and constructive. I don’t really criticize people on their coding style, although I do encourage people to try to follow the style of the project, and follow good practices like documenting their code using doxygen/javadoc/jsdoc and not leaking a million things into the global namespace.

      My coding effort is progressing fairly well so far. I’m already using other people’s code in there (I chose the CodeMirror js lib for my code editor). I will probably end up putting the code on github under a BSD license when I feel I have something worth showing (hopefully just a few days from now). Hopefully, if I can get the attention of some programmers, some people might be willing to help me add features. I have a long list of cool things I’d like to implement, probably too long for just one person.

    • Rom permalink

      “Every software engineer HATES code nazis!!!!!”

      Well, that’s your opinion but that doesn’t make it a universal truth.

      I, for one, prefers contributing to projects with strong code guidelines and will make sure to respect them to the best of my capacities. It’s not about criticizing others for the fun of it but instead aiming for more homogeneity through defined rules.

      I’m also the main developer (ie. code nazi) behind an open source project written on my spare time that grew over the last few years to a not-so-big but decent 180k lines of code.
      There have been quite a few contributors in the community and they’ve all very strictly followed the code guidelines/style.
      Those are not even written anywhere! They simply achieved it by mimicking the code already present and no one ever complained.
      It’s actually quite the contrary, many (contributors or users) have praised the consistency of the code finding it was much more accessible and readable than the average.

      Maxime, I don’t currently have much spare time but I’ll be following your progress closely as soon as you make your work public and will happily contribute later on, if you still need it. =)

      • The code is progressing well so far. I hope to have enough to show within a week or so. I minimally need to have some graphical output capability and the beginnings of a standard library. I will likely put the code on github and post an update on this blog, along with a list of tasks I could use help with.

Trackbacks & Pingbacks

  1. toyCPU Project Update « Pointers Gone Wild
  2. …and so it begins | SIDWIB

Leave a comment