Hacker News new | ask | show | jobs
by kaihong_deng 101 days ago
Really interesting writeup. What stood out to me most was the shift from the earlier node execution path to the streamed path. The benchmark gap between execute_r_nodes and execute_stream is huge, and the latter getting relatively close to the handwritten C++ baseline is the part I keep thinking about.

After building this, where do you think most compiler complexity actually comes from? My impression from your post is that a lot of the “millions of lines” are not from the core syntax-to-execution path itself, but from language surface area, tooling, diagnostics, optimization passes, and long-tail ecosystem baggage.

1 comments

The streaming (essentially a JIT) was actually from the early architecture three weeks ago. Though I'm glad you've read even the first post. On the current architecture performance hasn't been a target yet, though the core hasn't changed, I could build a MIX that streams and it would reach the same benchmark.

And I love the question. A lot of the complexity is coming from the management of seams, places where we have to go from one representation of information to another. The tooling, diagnostics, and optimization passes are as large as they are precisely because of these seams. Consider a liveness pass in LLVM, which spends a lot of time reconstructing information thrown away by the compiler so it could emit SSA. In GDSL, a liveness pass is simply handlers in the e_stage, in the example I snuck into GDSL-C print statements stamp liveness tokens onto their children via qualifiers and at assignment nodes, those without such tokens are killed. I can do the logic in a straightforward manner because we have all the information to work with, no seams, no SSA to derive scopes from, thus why a subset of it fits in 80 lines instead of 80,000.