Hacker News new | ask | show | jobs
by ckok 3217 days ago
Most compilers sort of work this way, except that they don't have an intermediate format. They take source, turn it into tokens, turn it into a tree of sorts, run several passes over it till the end result is reached (whatever the target is).

My own compilers turn 4 different input languages in the same parser tree (Hand written tokenizers/parsers), do several passes over it until it has a very base set of instructions left, at which point it generates code for whatever backend was picked.

The only big difference is that the process outlined in the article has an actual textual intermediate format, if I understand it correctly. Sounds like a lot of work of extra work with little gain, a simpler approach might be to have "ToString" method working on all nodes on all levels that' outputs info clear enough to understand from within a debugger.

1 comments

It's not that there's a textual intermediate format: just a defined intermediate tree structure. Some examples are given in the nanopass framework documentation [0]. So it's not that the input language is processed into an intermediate format, which is printed to a string and then read by the next pass; the intermediate languages are all in various forms of trees. See also the paper on writing Chez Scheme as a nanopass system [1].

[0] https://docs.racket-lang.org/nanopass/index.html

[1] https://www.cs.indiana.edu/~dyb/pubs/commercial-nanopass.pdf

That's a pretty interesting approach then; I like it.