Hacker News new | ask | show | jobs
by pedrozieg 187 days ago
What I like about this writeup is that it surfaces a tension most “let’s build a compiler” tutorials skip: the AST is both a data structure and a UX boundary. Super-flat layouts are fantastic for cache and memory, but they’re hostile to all the things humans care about (debuggable shapes, easy instrumentation, ad-hoc traversals, “just print this node and its children” in a debugger). A lot of production compilers quietly solve that by having two tiers: a nice, inefficient tree for diagnostics and early passes, and increasingly flattened / interned / arena-allocated forms as you move toward optimization and codegen.

The interesting question for me is where the crossover is now that IDEs and incremental compilation dominate the workload. If your front-end is effectively a long-running service, it might be worth keeping a friendlier AST around and only using a super-flat representation for hot paths like analysis passes or bulk refactors. Otherwise you risk saving a few hundred MB while spending engineer-months making every new pass fight the layout.

1 comments

What about this representation is hostile to humans and ad-hoc traversals? Don't convenience "getters" basically solve usability?
(author here) If you run the parser under a debugger like lldb, then attempt to inspect the AST of a program, it appears as an array of u64. Not very useful, unless you work on special support for debuggers (such as a python script to unpack it in lldb). Compare that to a tree of pointers, you can "expand" nodes without any extra effort.
I guess that makes sense. But I don't ever look at AST in a debugger, and if I needed to, I'd just write some python helpers.