Hacker News new | ask | show | jobs
by stingraycharles 2139 days ago
I once wrote my own virtual machine in college, complete with compiler and assembler, and I cannot recommend doing this enough. Especially the virtual machine part is not nearly as difficult as you would imagine, and to this day (15 years later) I still rely on the things i learned here.

The knowledge you gain from implementing a virtual machine translates reasonably well to inner workings of a CPU, and you’ll have a much better understanding of things like stacks, frame pointers and the overhead of calling a function. It will be completely obvious to you why “i++” is slower than “++i”.

Thanks for sharing this article.

3 comments

> It will be completely obvious to you why “i++” is slower than “++i”

...but it's not!

All you have to do is perform the most basic of optimizations: check, before generating code for an expression, if that expression is used. If not, then don't bother generating an intermediate for the result. Source: making a c compiler, just implemented this optimization today. (In an 'industrial-grade compiler', you probably want to elide this optimization, but do super complex control flow analysis on the resulting SSA to see that the intermediate is dead code. But for a toy compiler/vm, little tricks can save you a lot in codegen quality for little effort.)

> inner workings of a CPU

...if only. Lots of crazy stuff going on in a CPU that doesn't even start to come up in most VMs.

Technically, that's a bit off topic because this is about compilers and code generation, while a VM defines the basic semantics compilers work with.

For VMs, what is relevant is whether you implement an "inc (reg)" or not, that is if you choose to take the RISC path (small set of mostly orthogonal instructions) or the CISC path (lots of microcode, things like "repnz scasb" in x86 assembly or elaborate addressing schemes).

This actually somewhat a false dichotomy, as you can have a RISC-style instruction set plus a few "high level" instructions for things you expect to do a lot - like, for instance, array operators à la APL/J.

> All you have to do is perform the most basic of optimizations

I mean, of course this is possible, and every compiler everywhere hopefully optimizes this, but at the same time you need to understand why one is faster than the other in order to understand why it makes sense to optimize it. In the end, you’ll acquire a more in-depth understanding of these semantics.

Regarding the inner workings of a CPU, I wasn’t claiming you will completely understand a CPU, my god I hope you don’t because it’s a crazy rabbit hole.

Again, I was thinking about the higher level, like touching the top of the iceberg and seeing what’s in there. Many a programmer don’t have any idea how function calls work, or how functions return values by using the stack, etc etc.

Bottom line, just because you don’t learn everything, doesn’t mean you’ll learn something.

Better yet, every part of the C spec that has undefined behavior throws an error at compile time.
Why x++ was slower than ++x in the original C compiler maybe (and occasionally still when used inside another expression, but quite rarely these days on modern architectures and optimizing compilers).

Regardless, I much prefer to write ++x Because (a) I pronounce it “increase x” and I haven’t found a concise way to say “x’s current value but it is increased afterwards”. And (b) all other unary operators are prefix - easier to reason uniformly about things like order of evaluation.

In some contexts postfix is simpler - e.g. a memcpy-like implementation

    while (n—-) *t++ = *s++;
However, they are not very common in my experience.
I thought that i++ ++i difference is related to the inner data structures of C++ iterators and that when using the former you need to keep 2 states compared to just one for that fraction of execution.
thanks for the downvotes, I'm out of here for good!
What does ++ have to do with C++ iterators?

  int i = 0;
  i++;
How could that possibly involve an iterator? Moreover, C doesn't even have iterators.
You can use ++ on an iterator. Indeed you often do, e.g. in loops. ++iterator is preferable, and for consistency/so you don't have to think about its often recommended in C++ environments to just generally use pre-increment.

(Copying an iterator might be a lot more expensive than just copying an int, and might not be effectively optimized out by the compiler - e.g. if someone builds an iterator that's referenced-counted to an owning object or something)