Hacker News new | ask | show | jobs
by perfmode 3347 days ago
Is it theoretically impossible to fit an interpreter for a dynamic programming language in the L1 cache of a modern chip?

(I understand there are physical constraints that prohibit super low-latency memory lookups (of unconstrained size) in 0+epsilon time (where epsilon is small))

5 comments

Symbolics managed to fit their Lisp VM into the cache of DEC Alpha processors in the early nineties: http://pt.withy.org/publications/VLM.html
Thanks for the information.

> We built a prototype of the emulator in C, but it quickly became obvious that we could not achieve the level of performance desired in C. Examination of code emitted by the C compiler showed it took very poor advantage of the Alpha's dual-issue capabilities. A second implementation was done in Alpha assembly language and is the basis for the current product.

First pass in C. Final, in Assembly.

Chip at the time was a first-generation DEC Alpha AXP 500 which had a 512 KB B-cache and two 8 KB caches.

https://en.wikipedia.org/wiki/DEC_Alpha

Let's say its present day and you want to fit into a 256K L2. What language toolchains are available? How far can one go with JIT?

I think Lua, statically compiled against musl libc, can fit in 200KiB. Not in a (32KiB) L1 cache as the grand-parent comment asked, but in L2. There's also LuaJIT, which I think is only a bit bigger, I'm not sure ...
> Is it theoretically impossible to fit an interpreter for a dynamic programming language in the L1 cache of a modern chip?

I'm pretty sure Chuck Moore (yes, he's still around) would be able to fit the interpreter and an entire OS into the L1 cache with room to spare. Forth technically is an interpreter.

You can get a FORTH kernel in 2K words. It's also incredibly efficient for your own code since you basically have a dictionary and memory addresses. Thinking about it, in the old days dictionaries stored 8 character identifiers, which will handily fit in a 64 bit word. That means that the dictionary only needs 2 words per entry.

As you imply, the interpreter will be dwarfed by the code needed to talk to the rest of the OS. As an aside, this is why I was initially very excited about the JVM when Java first came around. Compiling down to a FORTH style language should give you pretty impressive benefits.

Virtual machines were very popular for a long time, but I'm not entirely convinced that we've really pushed the concept as far as it can go.

The amount of Forth code you could put in a small space is amazing. One of reasons a 128k Mac had Forth show up on it relatively quickly.

I remember back in the day someone was working on a Forth version of OpenStep. Weird, but there were some funky Forths then too. I so wish they had succeeded.

> Is it theoretically impossible to fit an interpreter for a dynamic programming language in the L1 cache of a modern chip?

The APL, J and K languages are known for being very fast; I vaguely remember one reason being given that their programs are so terse and high-level, they're compact enough to fit in cache. Not sure if the interpreter would though.

Also, apl j k have low interpretation cost, so you're pushing the metal close to its limits.
Even fitting the interpreter in cache, there's obviously still some overhead to interpreting instructions rather than executing them directly.

Also, I suspect most of the memory related slowdown with interpreters is due to the indirections in memory representation of data/code, not the interpreter itself falling out of cache.

There was a period where Opera had the fastest javascript engine of all the browser. It was a stack based runtime like all others at the time and, as a side-effect of being developed for mobile devices, was small enough to fit on a cache line. That was the key to the performance advantage. Then came V8 and everyone changed over to JITing compilers.