Skip to content

A Radical Introduction to Programming

July 10, 2012

Java for the Masses

A few years ago, when I was just starting my M.Sc. in computer science at McGill, I had to teach the COMP-202 Intro to Computing class. This is essentially an introductory programming class. The students enrolled were mostly first year computer science or computer engineering undergrads. The language I was to introduce to them was Java. During the first class, I presented the students with a typical Java “Hello World” program. I thought this would be trivial and obvious, but shortly after I showed them the program, there was a storm of questions:

“Why does it say class there?”
“What’s public?”
“What’s static void?”
“What’s main?”
“What’s System.out?”
“What are those brackets doing there?”
“What’s bytecode?”

It quickly became obvious to me that Java was not an ideal language for beginners. Its verbosity was cumbersome and explaining example programs required the students to have faith in my teachings: “I can’t explain all the details right now, but you’ll have to trust me, this isn’t relevant to the example, it will be explained later”. A few weeks into the class, it pained me to realize that while about 60% of the class did quite well, the remaining 40% struggled with every programming concept I introduced. For some people, programming seemed intuitive and easy, but for others, it remained foreign and mysterious. Having only limited teaching experience, I assumed I was the one at fault, until I found myself in a teacher’s assistant position for a similar class at another university, where I witnessed a similar scenario.

Many computer science departments across the world have chosen Java as their introductory language of choice. Other common choices are Python, Scheme and ML (mostly taught in Europe). My own university will likely soon be teaching JavaScript as an introductory language. These languages were chosen because they are believed to be simple and relatively easy for beginners to understand. At the very least, easier and less confusing than a language such as C, C++ or Fortran. Why is it then, that in most introductory programming classes, a large proportion of the class struggles? Why do “easy” languages still seem so difficult for some students to grasp?

I believe that one of the main issues when teaching a language such as Java, Python or Scheme to newcomers is that these languages are too high-level. Programming is all about getting computers to accomplish work. To properly understand programming requires some mental model of the computational engine that will perform this work. It shouldn’t be surprising, then, that so many students struggle. Most introductory programming classes teach nothing about the way computers work. They are taught by example using high-level languages that are fairly disconnected from how computers work internally. It remains very difficult to teach programming to students when you can’t fully explain to them how source code relates to computation and instead have to rely on handwaving.

Odd Beliefs in Computer Science

Many computer science professors hold the deep-seated belief that it’s not important at all to understand how modern computers work. The great Alan Turing has taught us that all universal Turing machines can simulate each other. Therefore, CPUs, clocks, registers, RAM, ALUs, logic gates and any other implementation details are completely irrelevant. Deep down, computer science is only pure math, the structured manipulation of symbols with abstract machines. Programmers need only understand the semantics of their language of choice. Turing would surely agree… Well, I actually found some interesting quotes about Alan Turing recently:

“Although a mathematician, Turing took quite an interest in the engineering side of computer design.”
– Maurice V. Wilkes

“He was particularly fond of little programming tricks (some people would say that he was too fond of them to be a “good” programmer) and would chuckle with boyish good humor at any little tricks I may have used.”
– James H. Wilkinson 

You may not know that Alan Turing, after his time at Bletchley Park, worked on the design of a real-world general-purpose computer, the ACE, which was ahead of its time with fast in-CPU registers, microcode, a subroutine stack and floating-point arithmetic. Turing knew how to use a soldering iron and dabbled into electronics. He was very much interested in the implementation details of real computers. Why? Because it mattered. Turing wanted to build something: a computer that could simulate a mind. Without actual computers and programming languages, computer science would be nothing more than a pipe dream.

Find me a highly-skilled programmer, someone with at least a few years experience, someone you admire for their coding prowess, and I can guarantee you that this person has at least a fair understanding of how computers work internally. They have an accurate mental model of computers, their capabilities and the semantics of the code they write. This understanding is part of what allows them to be masters of their craft. I’m not saying that you have to have programmed in C, assembly or VHDL to be a good programmer, of course not. I’m saying that any skilled programmer could likely manage if they had to.

Focus on the Fundamentals

Perhaps the right thing to do would be to focus on teaching intro to programming classes with something lower-level than Java. It might actually make sense to teach people assembly as a first programming language. I can hear people cringing at the thought of this idea. Assembly seems too complex, too low-level, too archaic, too arcane, tied into the implementation details of actual hardware (not portable), and not that useful in real-world programming tasks. Teaching assembly to newbies might seem like some anachronistic and weird idea to many of you.

At the same time, there are advantages to assembly as a teaching language. When you’re learning assembly, you’re learning how an actual computer processor does what it does. You’re learning about the instruction pointer, basic arithmetic operations, branching, memory layouts, the stack and how it can be used to implement recursive function calls. There is no compiler, there are no complex semantics. Everything is as plain and clear as it could be. For each instruction, the processor performs a precisely defined unit of work. The computational model is clear and unambiguous, and the root of such basic programming concepts as control flow and integer arithmetic is obvious.

You might still think I’m crazy. x86 assembly has hundreds of instructions and many ugly oddities that are purely due to historical accidents. There are simpler assembly languages out there (ARM, MIPS), but they are still fairly difficult for the newcomer to grasp. Here’s the thing, if you’re going to teach assembly, you don’t necessarily need to bother with these complexities. It should be possible to design a simplified assembly language for a virtual processor that still exposes the interesting concepts to be taught. Something with the following features, for example:

  • Minimal instruction set with branching, stack manipulation and integer arithmetic (15-30 instructions total)
  • 32 truly general-purpose registers, all of the same size (e.g.: 32 bits)
  • Instruction and stack pointers
  • Large, uniformly addressed memory space (e.g.: 4MB)
  • Simple calling convention, push everything on the stack, instructions can be provided to save and restore all registers at once.
  • All instructions operate on general-purpose registers in the same way
  • All instructions can take a constant where a register operand can go
  • Simple and obvious instruction syntax, favoring longer names instead of cryptic mnemonics (e.g.: r3 = shift_left r1, 2)

Such an assembly language could be implemented in a simple virtual machine, for example in a web browser, to make it widely accessible. Edited code could be written to memory and run instantly with only one click. I/O operations would be kept minimally simple, with direct memory mapped output to video and sound buffers, and some simple library routines to make it that much easier. Useful learning tools such as real-time visualization of register contents and a color-coded view of the machine’s memory space could also be provided.

Essentially, students would be programming a fancy “Commodore 64”, with all the teaching benefits of direct interaction with the hardware, but without the discomfort and impracticality brought by the restrictions of 1980’s hardware and what was practical to implement at the time. The teaching system would be simplified wherever possible while still resembling a real machine in all important aspects.

Because such a simplified assembly language wouldn’t have all the complex syntax and the long feature list of a high-level programming language, it should actually be possible to teach it to students fairly quickly (perhaps within only a week or two). The true benefit is that teaching such a language also implies teaching students about a realistic computational model. Once the students understand this, teaching them about high-level languages should be much easier and much more rewarding.

24 Comments
  1. It is unfortunate that so many people, especially educators, would turn their backs on the way that computers actually work. I think that if you don’t understand at a low-level what your program is actually doing, your program will suffer for it. I don’t think starting with assembler is the right way to go, but defintiely learning some assembler at some point is important.

    Perhaps course should work in both directions. Start with something like a sorting algorithm in C. In one direction move to higher level languages and constructs which abstract further, and in the other direction work towards the machine code. This would tie together what happens at the highest level with that at the lowest level.

    • I think assembly could work as a teaching language, and give students a more solid foundation, but only provided it’s made accessible enough. I definitely don’t think one could realistically teach real-world assembly to a whole class as a first language, but the simplified assembly I proposed might just work. I’m thinking it could actually be taught and learned rapidly as the first part of a programming class.

      I think I might just try to program a “virtual retro computer”, write some programs for it and see if people like it. It will definitely make for a fun JavaScript project.

  2. Ryan permalink

    My college uses Pep/8, which was a great help for my understanding of how computers work. It really prepared me for C programming in particular. It is a simplified assembly language running on simulated simple hardware.

    Here’s a link:
    http://code.google.com/p/pep8-1/

    Excerpt from the site:
    The Pep/8 computer is a 16-bit complex instruction set computer (CISC). It is designed to teach computer architecture and assembly language programming principles. Its 39 instructions are based on an expanding opcode and are either unary (one byte) or nonunary (three bytes). The eight addressing modes and eight dot commands are designed for straightforward translation between C/C++ and assembly language.

  3. Vashish permalink

    I would suggest checking out Karel the Robot. There’s a textbook and simulators for most every computer out there. I was well advanced beyond it before I learned it in school, but I found it quite interesting just the same because of how interesting it was to do things with such limitations.

  4. While learning a bit of assembly is indeed a very important part of any programmer’s education, I would say that the best *introduction* to programming is through Scheme or some other functional language. The algebraic view of computation seems to be a very gentle bridge from mathematical thinking into computational thinking.

  5. abe permalink

    I totally agree that a virtual assembly language is the way to go. I learned the LC-3 (http://en.wikipedia.org/wiki/LC-3) in college and it really exposed the “magic” of computer programming to me. It made it tangible.

    I’m more partial to a 16-bit machine because it’s easier to visualize and you can show how the 1s and 0s make up an instruction on a single page of a book.

    The intro class I took was amazing in that it started with how p and n transistors work (in simple “switch” analogy) then showed how those transistors made simple gates and gates made registers and other logic then it all came together to make the physical layout of the LC-3. Then we programmed it. That’s one of the best classes I ever took.

    • I was thinking 32-bit registers because then any register can be directly used as a pointer to any part of the memory space, and it mostly eliminates the need for things like “add with carry” instructions.

      With 16-bit registers, I need to compromise and either say that the memory space is 64KB, or make pointer manipulation more complex (segment register, anyone?). If the memory is 64KB, it means the “map the video and sound output directly into memory” (DMA) scheme needs to go, or gets a little more complex. I was thinking of making the screen be 512×384. This is 196608 pixels, even at 1 bit per pixel, it would take 24KB.

      I do agree there is some charm to 64KBs of RAM, however. It makes it possible to visualize the RAM and visibly see the stack grow as recursion occurs.

      • abe permalink

        64KB is plenty for someone just learning to program in assembly. The NES had a total of 4KB of RAM. The microcontroller I used in my Microcontroller class had 34KB of addressable memory (2K RAM and 32K ROM) and we programmed a tank A.I. at the end of that class to fight with other tanks in an arena.

        You don’t need fancy things like a “add with carry” instruction. 16-bit arithmetic is fine for most beginner programs. I also find it comforting to work with limited resources.

        If you really wanted to, you could make it word-addressable only and get 128KB address space. Then you could do 256-color video (8 bits/pixel) at 256*192 resolution (enough to play a decent video game), but doing full video and audio is kind-of unnecessary and over-the-top.

        If you ask students to make a game with hardware that can do anything, they tend to lose focus and work a lot on drawing the game art or adding sound files. When you give them hardware that can put a few color boxes on the screen and make some beeps, you’ll find that they have just as much fun, but do more actual learning. They work on making their square man jump and shoot and collide etc. because they dont get so distracted.

        • I’m curious. How did the NES do the drawing of sprites? Where were they stored, and how did the video controller work, roughly? If there’s an elegant way to do decent graphics and sound with 64KB of memory, it might be acceptable. I just don’t want the system to be clunky, tedious to use.

  6. David Slaybaugh permalink

    Have you tried “The Definitive Guide to How Computers Do Math : Featuring the Virtual DIY Calculator” by Clive Maxfield and Alvin Brown? It sounds exactly like what you are describing, it comes with a simple instruction set, a debugger, and several code labs. This really helped me to understand what is happening inside a processor. More details at http://www.diycalculator.com/

    • Seems like a good teaching method, but I think it might be hard getting students excited about building a calculator!

  7. Edward permalink

    Have you ever heard of MMIX?

  8. rdm permalink

    I think that teaching assembly/machine language has a lot of merits.

    I am a bit dubious about teaching a fantasy virtual machine, however. Instead inventing a virtual machine, why not just teach using a subset of ARM for example? (I would suggest a subset of intel 64 bit, but in this context a simple subset seems too limiting.) There should be no requirement that the student masters all instructions, all registers, all hardware, all addressing modes.

    I would not object to teaching Scheme — it can be very much the same kind of thing as machine code. Lisp, after all, was originally a machine code system, which is where names like car and cdr came from.

    That said, education should not be viewed as a linear process. And I would not hesitate to teach different points of view (for example, for describing hardware capabilities: classes based on Iverson’s notation for describing hardware processes have been comprehensible to ordinary students in high school and grade school).

    My point here is: we should not cringe away from teaching useful subjects in introductory classes. Some parts of these subjects will be approachable.

    That said… in any large class there’s a risk of leaving some students behind while not progressing at a rate suitable for other students. This seems to be inherent in the learning process, and I feel that it’s a significant problem. Students learn where they are interested and engaged, and that’s something that depends on the student. Standardized education should be about granting equal opportunities, but students’ lives will not be standardized after they leave school and there’s probably something to be said for educational systems which are based on keeping students engaged without requiring students remain in lock-step with each other.

  9. I’m currently a student and interestingly, my college uses a processor called P3, completely 16 bits and with 64kb of RAM, with a simulator made by themselves (they do have a real hardware version, but I never got to use it). A problem tough, it seems, is that part of the documentation is degrading and most of it is in a foreign language.

    The only thing online about it is in here: http://p3emu.sourceforge.net/ And even then, links to the actual specifications and such are all dead.

  10. Thoughtful post. I wonder if you’d be interested in “The Elements of Computing Systems”? You can find it here (course & book): http://www1.idc.ac.il/tecs/.

    • That book has amazing reviews on Amazon, nothing under 4 stars!

      • I’ve own a copy and have read it, it’s quite a good instructional text. I think I would have gotten more out of it had I actually followed the course and done the exercises/assignments, but it was good even without doing all of the work it comes with.

        I’m also partial to Jon Stokes’ “Inside the Machine”. A great book on systems internals, though it doesn’t come up as high in terms of abstraction layers as the previous book.

  11. shogg permalink

    The vm you’re talking about already exist:
    http://en.wikipedia.org/wiki/Core_War

    This vm has like 10-20 opcodes and 1-10 KiB of memory.

    Today it’s a game (some kind of). Your students would write little programs and let them battle each other.

  12. I’ve talked to a lot of friends about this exact topic and how horrible it is to start teaching programming with high-level languages. The top-down approach used for teaching computer science these days leads to students who treat the physical computer as a deep black box. Even worse, day by day they internally develop a growing fear against _anything_ related to low-level software development.

    I believe that such a teaching method won’t lead to a generation which can someday invent a new programming language, a new compiler, or an efficient networking protocol. The top students will make it _anyway_ regardless of the used teaching method; my point is always about “the masses” of CS students.

    To put my money where my mouth is, I designed a small & unofficial summer course in my university to teach first-year students programming from the ground up: we started from Babbage engines and moved slowly to Intel x86 assembly and (little) operating system details. I’ve even gone one step further: no textbooks; only the original primary references for each topic was used.

    I’ll never forget the impressions on the students faces and their ever-engaging questions. Once they were in the mindset of “deep understanding” and “historical progression” of the topics outlayed, they began to ask hard-core questions all the time! Honestly, it felt like opening a can of worms and I had to go back and re-check lots of references to make sure that I’m providing valid answers ;-)

    People grasped topics like computation, pointers, compilation, linking, and even complex operating system details in a snap, and I was very happy. Furthermore, and as nicely portrayed in your example of Alan Turing, this approach is not really radical _at all_: Donald Knuth can be another nice example since his original quest was compilers after all.

    In brief, the history of computer science reveals that the physical details of computers and mathematical theory was never mutually exclusive: they deeply _complemented_ each other. So why are we creating such stupid and artificial boundaries in our CS curriculums?

    • Sounds like a brave and successful effort on your part. Are you sharing your material online? :)

      I think that there’s a large amount of disdain for programming and low-level details, including hardware. Those things, while foundational to our discipline, are seen as lowly, primitive, detracting form more “pure” mathematical quests.

      I think the truth is that many people have difficulty grasping those things. Debugging programs forces you to control often extremely complex interactions among a system and can be very challenging, most people would rather avoid it. One way to avoid it is to do as little programming as possible.

      Mathematicians prefer to believe that computation has nothing to do with the machines it runs on, that mathematical expressions float in some void somewhere in an ethereal space, applying themselves to each other endlessly… That if only they could will the purity of their thoughts into a machine, there would no longer be any bugs.

      The truth is that bugs happen because once a system becomes sufficiently complex, its behavior becomes too difficult for any human mind to fully grasp and predict a priori. Writing code will cause bugs, writing good code will always be challenging, no matter the language.

      • The materials was basically a folder of collected papers from:
        * IEEE Annals (for the Babbage and von Neuman stuff)
        * Articles by Vannevar Bush and the like (they provide a great bridge between the past and the future)
        * Some references on the UNIVAC (it had some nice parallels with the 8086 regarding the culture it created)
        * a single paper on MIT’s CTSS (to introduce the basic ideas of a supervisor/kernel)
        * official references for the Intel x86 architecture and the primal PC stuff.
        * and finally a plain-text file outlying the course’s path

        Assignments and projects were taken from a very nice free book called “Programming from the Ground Up” by Jonathan Bartlett. But much of the focus was centered on the primary references.

        It was an unofficial and cozy gathering (I’m a bachelor, no masters or Ph.Ds yet), but I tried to measure my performance by students engagements and final projects results. There was around 20 students, resulting in four groups. 2 groups had a final project of reversing .wav audio files, with success. another created a heap allocator similar to the one in K&R’s C book, with success. The last group faced some difficulties, but around 50% of their project was working.

        And the above did not affect their growth on the “high-level” or “software engineering” front at all: most of them are now Java and web/mobile development folks anyway. So I guess I can consider it a successful expirement.

        I wish to evangelize this teaching method a bit more, and my little new blog is a first step at that.

  13. Look at Cecil. I started on Z80 assembler and soon after 6502. Cecil was taught in school as a reasonable emulation of how something like a PDP-8 would work and I wrote an interpreter in Z80 assembler for the class to use.

    Spend a few lessons doing stuff like that and then bring in a higher language when the concepts are clear. Most of the syntax of higher languages stems from the old architecture.

  14. Michael permalink

    You really should take a look at the http://nand2tetris.org course.

Trackbacks & Pingbacks

  1. Designing a toy CPU « Pointers Gone Wild

Leave a comment