I'm happy to see that my previous post, A Radical Introduction to Programming ended up generating quite a bit of discussion on r/coding. The reception was fairly warm overall. Most people seemed to think that the idea of teaching simplified assembly as a first programming language made sense. Those who disagreed seemed to mostly come from two camps: adepts of functional programming, and others who simply didn't see the potential of assembly as a teaching platform, for a variety of reasons.
I would like to say that I'm not suggesting the teaching of assembly should replace teaching of high-level languages. What I would like to do with assembly, is give students a short 2-4 week introduction to computers and programming. Fundamental concepts such as integer arithmetic, branching, the run-time stack, recursive function calls and arrays would be exposed in a very concrete, hands-on manner. Ideally, students should then be able to move on to higher-level languages with more confidence, and perhaps even learn more advanced material at a faster pace.
The positive responses to my post have motivated me to go ahead and develop an online emulator for simplified assembly, and perhaps even build a series of tutorials around it (help with the writing would be appreciated). I don't necessarily think that this system will be the perfect learning platform for everybody, some people might learn better with other teaching methods, but I certainly think it's worth at least putting it out there and letting people try it. If just a few people learn something, I'll be happy. I, for one, think I would have loved such a system when I was 9-10 and started getting interested in computers!
Some people have asked why I'd want to design my own assembly and emulator when there are other existing assembly options out there, some of which already have browser-based emulators. It has been suggested, for example, that I could instead use (from most to least practical):
Notch's DCPU-16 architecture (in-browser emulators exist)
6502 (NES, C64, Apple II) assembly (in-browser emulators exist)
A subset of ARM assembly
Physical microcontroller devices (Atmel, PIC)
Actual vintage C64 computers
I'll go right ahead and confess that I do like the idea of rolling my own, and I feel very confident in my own ability to do so. This isn't the only motivating factor, however. While it may not have as much of a cool factor as a physical computer, there is clearly a definite teaching advantage to having an in-browser emulator that can be accessed from anywhere at a cost of zero dollars and can't be broken by accident. As for other assembly languages, I feel they generally have arbitrary restrictions that will hinder teaching. My goal is not really to simulate a retro computer, it's to make the language as accessible as possible, and things like having only three usable 8-bit registers can get in the way.
As is usually the case when I start getting excited about a project, I had a flood of ideas for my emulated system. I've been giving some thought to architectural considerations, trying to strike a balance between realism, simplicity and ease of use by novices. The following is largely a brain dump of my thoughts on the design of the system:
The instruction set will have the standard arithmetic (add, sub, mul, div, mod) and bitwise (and, or, xor, not, shift) operations, as well as multiple conditional branching instructions. The goal is not to make the instruction set as small as possible. It is to make it somewhat realistic, very easy to grasp, while demonstrating important concepts.
I'm thinking a 16-bit architecture might be best at this point. All registers would be 16-bit, and the memory space would contain 64K words (either 8 or 16-bit each). As someone pointed out, 16-bit registers can easily be displayed on one page, showing all the bits. A 64K memory space is small enough that one can easily visualize it and see the stack pointer move, while being large enough to allow useful programs to be written.
There will most likely be 16 registers in total. 12 fully general-purpose registers, a flags register, the stack pointer, the instruction pointer, and another register currenty reserved for future uses. All registers will be usable in any operation that can take a register argument.
The instruction encoding will be fairly naive. Variable length. One opcode byte, one operand format byte, extra bytes for operands (register indices or constants). The point is not to have a realistic ARM or x86-like encoding, but rather that the instructions have an encoding in memory, and that this encoding be easy to decode if desired.
I'm unsure at this point if the RAM should be byte-addressable, or addressable only by 16-bit words. The latter has the advantage that we get twice the ram (128K!) and we can probably offer better visualization of values in the RAM, as they couldn't be unaligned. The disadvantage is that real-world systems are usually byte-addressable (sort-of?).
I was initially thinking of memory mapping (DMA) all the display memory. This is problematic because even at 256x192 and 8 bits per-pixel, that's 48KBs, most of the RAM. I think I will instead rely on a virtual display adapter that responds to simple commands (draw a square, draw a circle, draw a line, draw a sprite) and has its own memory spaces for the frame buffer and sprites. This will allow me to "cheat". RAM will be freed, people will be able to plot individual pixels if they so desire, but they will also be able to do fast rendering of decent-resolution graphics in 24bpp glory.
I'm thinking the sound chip should probably be a simple 16-voice subtractive synthesizer, with the standard waveforms, an LFO and a filter, all able to follow individual envelopes. The possibility of playing raw audio data would be trivial to support as well.
For some realism, virtual devices will be interacted with through a simulated parallel master-slave bus (the CPU being the master, all devices being slaves). Communication will be in the form of 16-bit words. Library routines will be provided to wrap all the gory (not that bad, actually) details and provide simple calls that satisfy most use cases.
The library code will be browsable (read-only), and possibly indexed with a doxygen/javadoc-like system, allowing you to get a description of all available routines and even see their implementation if you so desire. The library should be fairly extensive and allow you to get started writing useful programs right away.
I'm not sure if it's best to grow the stack up or down. Is there a disadvantage to growing it upwards?
I've been thinking of a name for the system. Right now I'm leaning towards toyCPU (short and sweet).
I'll look into possibly using Google AppEngine to store user-created programs online, and perhaps allow users to rate them, comment on them and fork their own version. This should help grow a user base.
I already started working on the implementation. I decided to use the CodeMirror library to give the code a nice display with line numbers and syntax highlighting. I think I might start to have something working this weekend, but I probably won't make it publicly available until it's closer to complete. I figure it's best to wait until the implementation is relatively stable and there is a mechanism to share your code online so that people don't waste effort or lost interest before the system is ready. I may, however, be willing to share a link with a few people early (beta testing, anyone?).
If you're interested in helping, I could use help at various stages:
Setting up an AppEngine platform for code sharing and community building.
Writing virtual devices, such as the graphics adapter, the sound adapter and the system clock.
Writing the standard library.
Writing cool demo programs (simple games, music, basic/scheme interpreter, demoscene, etc.)
Writing programming tutorials.
Beta testing.
Promoting the platform and tutorials.
If you're interested in helping, please let me know. I will warn you that I'm somewhat of a code nazi, but I do my best to be friendly.