Hacker News new | ask | show | jobs
by loup-vaillant 5309 days ago
Indirectly related: the size of current systems: a typical desktop system is written in about 200 millions lines of code (about 10K books, or a library). http://vpri.org/ (co-founded by Alan Kay) is trying to make a roughly equivalent system in 20K LOCs, or about one single book. And it looks like they can do it (5 years in the project, 1 more year to go).

Let's say it is possible. That would mean current systems are about ten thousands times bigger than they could be. That's 4 orders of magnitude. And even if it isn't 4 full orders of magnitude, I'm willing to bet on 3.

It is not yet about raw speed, or latency. But when a system is at least 3 orders of magnitudes bigger than it could be, it does mean that something there vastly suboptimal. And runtime performance could very well be part of that "something".

1 comments

Yes, but is that 20K LOC system equivalent in functionality to the larger systems? In every respect, and not just the ones you happen to care about?
Just the ones they happen to care about. I don't think it matters such a great deal however: people tend to care about the same things. Feature creep is when you want to fully satisfy everyone, a few people at the time. Plus, if you want your missing feature, you can code it. I mean, you really can. Many components of that system don't spend more than 1K LOC, they really are accessible.

But that's kind of a straw man. Even if you convince me that feature creep really is valuable, lack of features explains but 1 order of magnitude out of 4. There's still 3 to go. I have two explanations for those.

First, they reuse their code. A lot. When they write a compiler, all phases (parsing, AST to intermediate language, optimizations, code generation) are done with the same tool (augmented Parsing Expression Grammars, search for the OMeta language for more details). When they draw something on the screen, be it a window frame, a drawing, or text, they again use a single piece of code. Mere factorization goes a long way. Id' say it explains about 1 order of magnitude as well.

Second, their use of specialized languages yield astonishing results: they can build a self-implementing compilation system in about 1000 lines (including a bunch of optimizations). 200 more lines gets you a reasonably efficient implementation of Javascript, 200 more gets you Prolog, and a couple hundreds more can get you about any DSL you may want (external DSLs, not your average Ruby/Haskell combinator library). They implemented an equivalent of Cairo in 457 lines, which is about 100 times smaller (and quite efficient to boot, but that was a surprise bonus). They did a TCP-IP stack in about 160 lines, which again is about 100 times smaller than a typical C implementation. And they did all that with specialized languages that themselves are implemented in very little code. Based on that, I'd say their use of domain specific languages explains about 2 orders of magnitude. (Don't take my word for it. See their last progress report here: http://www.vpri.org/pdf/tr2011004_steps11.pdf )

To sum up, we could argue that current systems are about 4 orders of magnitude too big. Of the 4, 1 may be debatable (lots of features). Another (not reusing and factorizing code) is obviously something that has Gone Wrong™ (I mean, it could have been avoided if we cared about it). The remaining 2 (DSLs) are a Silver Bullet. Not enough to kill the Complexity Werewolf, but it sure makes it much less frightening. By the way, we should note that the idea of DSLs is around for quite some time. Not using them so far may count as something that has Gone Wrong as well, though I'm not sure.