Hacker News new | ask | show | jobs
by shurcooL 4768 days ago
> 3) I get your whole in-the-future-we-store-ASTs argument, but that's certainly not the case today. Today we store text, and I don't see that changing. Could there also be a diff, perhaps even another mode that, counter to just dealing with structure, deals with formatting? ie, find the ranges that contribute nothing to the AST and then diff those textually.

Agree. I think it's likely code will be stored as text, but parsed into ASTs easily. Consider Go standard library has a full language parser built right in, so to get an AST from a .go text source file is about 3 lines of code.

Because you can get the code back from AST easily (while maintaining spacing), it really doesn't matter what you save. You can use tools to edit the text form or AST form without any duplication.

1 comments

I've actually been trying to make 3) happen with a long-standing project of mine, because serializing AST to text discards information. In a (proper) AST, human-readable names are never used as pointer values.

When you bind something (reference a variable), you just point at that variable's node, rather than mentioning it by string. This removes a huge class of artificial problems brought on by plaintext source code (symbol name clashes, namespaces, overeager imports, var name typos, shadowing, and other programmer-compiler miscommunications), but introduces some editor UI concerns (e.g. indicating shadowing).

There are a whole host of other advantages of saving as ASTs directly. One of them is granular, semantic diffs like in the OP. I'm convinced it's the future... there are a lot of UI problems to solve in a solid, practical editor for it though.

I don't see why a variable can't have a "name" property, even if it's not used as pointer value. I don't see why ASTs should have less information than source code. IMO they're equivalent, and just different forms of representation.

ASTs are easier to manipulate with code, source code is easier to manipulate with text editors. We don't really have good tools for manipulating ASTs yet.

The same amount of information exists, but is normalized. The variable _definition_ absolutely will have a "name" annotation, but copying that name to each binding site is brittle. To render a binding, just dereference the pointer to the original definition and use its name.

I'm working on it.

Me too. :)
While there are certainly advantages to storing code in an AST format, I don't think any of these are significant enough to overcome the hugely important advantage of legacy support. Maybe in the long term it will make sense to replace text based code with a new AST format, but for now a slightly crufty but ultimately isomorphic text format seems entirely sufficient.