Hacker News new | ask | show | jobs
by 4984 1629 days ago
MiniVM will JIT eventually. Right now the biggest barrier is the VM snapshots. Currently MiniVM its own stack and heap and can snapshot them from any point in the program.

One thing to note about MiniVM is that it has very strong types. V8 has many steps to go through when it needs to perform something like addition (weak typing). MiniVM supports only 4 basic types.

MiniVM's reference frontend Paka is fully self hosted, It is currently the only large MiniVM program.

3 comments

That sounds like a solid basis for adding asymmetric coroutines as well, is that something you're thinking about adding?

I like the philosophy and gumption shown by this project but coroutines aren't a feature I'd lightly give up.

Digging what I see so far!

MiniVM uses coroutines already, every 1000 branches the vm will return to the scheduler. The Build system can accept CFLAGS+=-DVM_BRANCH_DEFER which makes MiniVM able to run part of bytecode up-to N instructions.

It is not exposed on the insruction level yet tho. Would be quite easy to add.

Ah thank you, that's very promising! It was the kind of thing I wanted to hear, I write Lua every day and four data types didn't appear to support 'thread'.

I might just kick the tires on this over the weekend.

That’s cool. Have you played around with that number? I’m curious how you pick that constant and what the trade offs are
It might be interesting to be able to tell the VM to resume into code for exactly N instructions or branches. I can see instrumentation/debugging tooling using this with N=1 for example.
Question on VM snapshotting: what’s the purpose/point in even having such an ability? What does it allow you to do?

I only know of snapshotting perhaps being necessary to support coroutine based context switching.

Thanks and very cool project!

A snapshot is a the entire state of a program at a single moment in time. Continuations are basically exposed snapshots, i.e. taking a snapshot, storing it in a variable, doing some work, and then 'calling' the snapshot to return to an earlier point. Continuations allow you to implement a naive version of single-shot delimited continuations - coroutines! This can be very useful for modeling concurrency.

Aside from coroutines and continuations, snapshots are neat for distributed computing: spin up a vm, take a snapshot, and replicate it over the network. You could also send snapshots of different tasks to other computers to execute. In the context of edge computing, you could snapshot the program once it's 'warm' to cut back on VM startup time.

Snapshots allow you to peek into your program. Imagine a debugger that takes snapshots on breakpoints, lets you to inspect the stack and heap, and replay the program forward from a given point in a deterministic manner. You could also send a snapshot to a friend so they can run an application from a given point on their machine. If you do snapshots + live reloading there are tons of other things you can do (e.g. live patching and replaying of functions while debugging).

To do these kinds of things on Linux checkout https://www.criu.org/Main_Page

Checkpoint and Restart In Userland

One nifty thing you can do with snapshots in games is use it for save/restore/undo.

See for example the Z-machine used for interactive fiction, and its younger cousin Glulx: https://www.eblong.com/zarf/glulx/Glulx-Spec.html#saveformat

(Apart from that, I’d note that Glulx and the Z-machine are both terrific examples of detailed VM specifications. Glulx is impressive because it was built by one person, the Z-machine possibly even more impressive as it was reverse-engineered by insanely dedicated Infocom fans.)

Out of curiosity, are you planning on (progressively, slowly) rolling your own JIT, or using something like DynASM (https://luajit.org/dynasm_features.html), libFirm (https://pp.ipd.kit.edu/firm/Features.html), or some other preexisting thing (eg https://github.com/wdv4758h/awesome-jit) in the space?

FWIW, I understand that LuaJIT gets some of its insane real-world performance from a JIT and VM design that's effectively shrink-wrapped around Lua semantics/intrinsics - it's not general-purpose. I've read (unfortunately don't remember exactly where) that people have tried to run off with the JIT and VM and use it in other projects, but never succeeded because of the tight coupling.

In the same way, while a bespoke 64/32-bit x86+ARM JIT would be a reasonable undertaking, it could make for a pretty interesting target with a potentially wide-ranging set of uses.

For example, it could be the VM+JIT combination that all those people trying to dissect LuaJIT were looking for :).

I could see something like this becoming an attractive option for games that want an exceptionally simple runtime. Sort of like a scaled-down NekoVM (a la Haxe).

Broadly speaking, I get the (potentially incorrect) impression (from a very naive/inexperienced POV) that MiniVM+JIT would be looking to close a more widely-scoped, higher-level loop out of the box than something like libFirm would be (despite the very close correlation in functionality when looking at libFirm). So it'd be closer to Cling (https://root.cern/cling/) than raw LLVM, perhaps (albeit with 0.01% of the code size :D). It is for this reason that I kind of pause for a minute and ponder that a fully integrated JIT could be a pretty good idea. It would absolutely make the irreducible complexity of the project balloon, with reasonable motivation likely necessary to maintain cohesion.

On a different note, If I were to backseat-drive for a minute :) the first thing I'd rant about is how attractive modern JITs need trivial ways to verify code correctness, both online (how exactly was *this* specific generated code constructed - so, straightforward logging) but also (and particularly) offline (humans staring at the JIT source code and mentally stepping through its behavior - and succeeding (miracles!!) because the code is small and comprehensibly architected). If the JIT was implemented in such a straightforward manner, end users wanting to run potentially malicious user-supplied code with high performance in potentially security-sensitive settings might be attracted to the project. (Mike Pall made bank for a while while CloudFlare was using LuaJIT for its WAF*... ahem...)

I came across this reference of how to break out of LuaJIT 2.1 (2015) a while back: https://www.corsix.org/content/malicious-luajit-bytecode - and every time I take a look at the code I switch away from the tab :) (and sometimes step away from the computer for a minute :D). It's solely a demonstration of "this is how it would work", and clarifies that LuaJIT makes no sandbox guarantees about the code it executes, but reading through it, the amount of Stuff™ going on represents a surface area that to me (naively) seems just... like LuaJIT as a whole is generally too large to easily reason about from a security standpoint (oh yeah, besides being written in assembly language...). This might be inexperience speaking, but I can't help but wonder whether a smaller, simpler implementation might be able to implement a secure JIT; for all I know this might be an impossible P=NP pipe dream I haven't fully grasped yet, I guess what I'm trying to figure out is whether "small enough for non-100x programmers to mentally reason through" and "large enough to do a few practical things quickly" have practical overlap?

[* CloudFlare only ran internally-generated code through LuaJIT, for completeness. A VM+JIT that could safely run untrusted code would thus be even more interesting than LuaJIT was to CloudFlare, from a general perspective. ]

---

Something completely different that I randomly discovered recently and which I thought I'd idly mention is that JITs might seem to have a bit of a hard time on Android in certain obscure circumstances. I can't (yet) tell if this is LuaJIT-specific or "anything that's a JIT"-specific: KOReader (Android eBook reader, implemented entirely using LuaJIT - https://github.com/koreader/android-luajit-launcher) has a bunch of very scary magic Things™ it seems to need to do to make LuaJIT even work at all on Android (https://github.com/koreader/android-luajit-launcher/blob/mas...), due to a apparently-current issue causing issues across different domains (https://github.com/LuaJIT/LuaJIT/issues/285), which has been apparently cropping up going back years (https://www.freelists.org/post/luajit/Android-performance-dr... (2013)). KOReader has even nontrivially patched LuaJIT's C code in places (https://github.com/koreader/android-luajit-launcher/blob/bb0...) with purposes I am yet to fully understand (it might just be for debugging). I happened to be considering idly playing around with Lua on Android (currently fascinated with interpreted/JITed runtimes) and after stumbling on this I'm debating whether to use Lua instead, ha. I've been meaning to ask around on the LuaJIT list and wherever KOReader discusses stuff to learn more, after spending the time actually getting LuaJIT linked into an Android project and poking around. Haven't got to it yet. This could be an absolutely massive red herring that I'm going deer-in-headlights about because it just looks off-in-the-weeds, or potentially significant. It might also be absolutely irrelevant to not-LuaJIT pursuits, but as I noted I'm not (yet) experienced enough to tell.