| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shalabhc 3212 days ago

It's not about parsing being difficult or easy (e.g. you would still have to parse an abstract structure into a syntax tree specific to your language semantics). It's about making a structured form be the canonical baseline (instead of the canonical being a 'sequence of lines' i.e. text).

Consider that every programming language and every config language first invents a new syntax to encode a tree like structure (typically using a combination of curly braces, other brackets, keywords, indentation etc.) but the code itself is saved as 'text'. This is a lossy encoding - all a generic reader such as `git` or `grep` can now infer is that the file contains a 'sequence of lines' and can then only offer line based operations (git diffs are line based, grep searches are line based, etc.), when in fact a more meaningful operation would be the tree structure based.

If a tree based format was the canonical baseline, diffs could display the location of the node added (e.g. 'Added <Class X> -> <Function Y>'), without having language specific parsing knowledge. Similarly, most editors could provide 'tree view' and 'jump-next', 'jump-up' etc based on context, again without knowing language specific details. Further, many internal representations of programs (e.g. intermediate representations in compilers) also use trees, and could potentially be exported into one of these forms, to make the plethora of tools work with them.

(BTW, I'm not saying a tree is the best generic structure to replace text, but just using it as an example to argue for advantages of a generalized extensible structure over plain text.)