The Problem with HTML

April 16th, 2015

The HyperText Markup Language (HTML) was invented by Physicist Tim Berners-Lee of CERN, and its first publicly available specification surfaced on the internet in 1991. It was meant intended as a simple format for people to share enriched static documents or pages that could refer to one another through hyperlinks. The original design was relatively simple, with just 18 types of elements. As of today, the HTML specification in PDF format is 1156 pages long.

HTML, on its own, usually isn't all there is to a modern webpage. Most websites are also described in terms of Cascading Style Sheets (CSS) and JavaScript (JS). What's the problem, then? The problem is that HTML, which is already a hugely complex standard, is growing in complexity year after year, along with CSS and JavaScript. Furthermore, HTML is now a "living standard", meaning it's constantly changing and there is no version number. This might be the official recognition that the web is moving too fast.

As of today, Chromium is about 17 million lines of code, comments excluded. The sheer complexity of HTML+JS+CSS is enormous. This is a problem because it means there are many areas where implementations can differ. What I mean is that your webpage is increasingly likely to break or not behave the same on different web browsers. The fact that HTML is a fast-moving target and a "living standard" certainly won't help. What recently dawned on me is that HTML too complex for any individual or corporation to ever implement a new web browser from scratch.

CSS is now complex enough that you can do animations with it. It's not really just a stylesheet format anymore. The way I see it, it's really feature-redundant nonsense. JavaScript is a programming language, and it should be plenty capable of doing animations. Giving CSS the ability to do animations is a great way to guarantee your webpage, which looks nice on Firefox, won't look right on Chrome.

Then there's JavaScript. I know JavaScript pretty well, because I've been working on my own JIT compiler as part of my thesis. It took me about two years to have about-complete support for ECMAScript 5 (ES5) and decent performance. The ES6 draft specification is literally twice the length as that of ES5, a huge increase in complexity. They're not done though, many features are already planned for ES7, even though the ES6 specification is not yet officially finalized. You can't really call JS the assembly language of the web with a straight face when it's trying to integrate all the features of Java, C++, CoffeeScript and Haskell.

A few years ago, I started playing with new HTML APIs to produce audio output. I was one of the first play with Mozilla's Audio Data API. This API allowed you to generate raw audio samples and send them to an audio output, among other things. Unfortunately, Chrome never implemented this API. They opted instead to create the competing Web Audio API. Since Web Audio has the backing of the W3C, it became obvious it would win out, and the Web Data API would die. It took about 3 years for Firefox to finally implement Web Audio. I'm still not sure if Safari and IE support it properly.

The victory of Web Audio over Audio Data frustrated me a bit, because I really thought the Mozilla's Audio Data API had the right approach: it was simple and low-level. You generated samples and wrote them out for output. The Web Audio API, in comparison, is essentially a modular synthesizer, with many different types of audio processing nodes you can interconnect together. It's way more complex than the Audio Data API. The Web Audio API is most likely implemented in C++ and interfaced with JS through several dozen types of HTML DOM nodes.

The Web Audio API is stupid for two simple reasons. The first is that you really could implement your own modular synthesizer in JS (please refrain from telling me that JS is too slow for that as I've already implemented one). Letting people implement their own modular synthesizer as part of a library is more flexible and more portable. The second reason Web Audio is stupid is that implementing many kinds of pre-made synthesizer nodes in C++ and interfacing them through the DOM almost guarantees that Web Audio will not sound the same on different web browsers.

In my opinion, the problem with HTML is the growing complexity and feature redundancy. The web is moving very fast, too fast maybe, probably in part because it's a hugely competitive market. Everyone wants HTML to support the latest and coolest feature their need in the way they like best. Design by committee ensures that bloat accumulates. I think that one day, HTML is bound to crumble upon its own weight, and something more minimalistic will take its place. Something designed from the ground up for web applications instead of static webpages. The app stores used on cellphones may be a hint of things to come.