Hacker News new | ask | show | jobs
by pshc 4768 days ago
I've actually been trying to make 3) happen with a long-standing project of mine, because serializing AST to text discards information. In a (proper) AST, human-readable names are never used as pointer values.

When you bind something (reference a variable), you just point at that variable's node, rather than mentioning it by string. This removes a huge class of artificial problems brought on by plaintext source code (symbol name clashes, namespaces, overeager imports, var name typos, shadowing, and other programmer-compiler miscommunications), but introduces some editor UI concerns (e.g. indicating shadowing).

There are a whole host of other advantages of saving as ASTs directly. One of them is granular, semantic diffs like in the OP. I'm convinced it's the future... there are a lot of UI problems to solve in a solid, practical editor for it though.

2 comments

I don't see why a variable can't have a "name" property, even if it's not used as pointer value. I don't see why ASTs should have less information than source code. IMO they're equivalent, and just different forms of representation.

ASTs are easier to manipulate with code, source code is easier to manipulate with text editors. We don't really have good tools for manipulating ASTs yet.

The same amount of information exists, but is normalized. The variable _definition_ absolutely will have a "name" annotation, but copying that name to each binding site is brittle. To render a binding, just dereference the pointer to the original definition and use its name.

I'm working on it.

Me too. :)
While there are certainly advantages to storing code in an AST format, I don't think any of these are significant enough to overcome the hugely important advantage of legacy support. Maybe in the long term it will make sense to replace text based code with a new AST format, but for now a slightly crufty but ultimately isomorphic text format seems entirely sufficient.