| Funny, I actually did the same thing for python years ago and also talked about "Lossless Syntax Tree" then start calling it a Full Syntax Tree since CST doesn't really fit; I wasn't aware of the existence of lib2to3 at that time. My goal was different than your: I wanted make writing custom refactoring code (that's it: code that modify source code) and code that works on source code a do-able task. I end up doing some design decisions that I haven't found elsewhere (but this field is hard to explore): - producing json, because datastructure doesn't lie to you and potential interoperability - nodes are responsible for the formatting within itself, in opposition with lib2to3 where a node is responsible for the formatting before itself (or after, I'm not sure anymore) - the tree is design for the human brain instead of an interpreter/compiler (for example having list instead of recursive structures) The project is called Baron https://github.com/pycqa/baron and was actually a mean for me to work on what really interest me: the abstraction that attempt to make writing custom refactoring a doable task https://github.com/pycqa/redbaron Good luck with your project :) |
I think the goals are actually similar -- refactoring is a style-preserving transformation that is similar to changing the language. At Google there is a bunch of work on converting C++ 03 to 11 to 14 to 17, etc. and they're using similar techniques with Clang. That's an example of a refactoring that's also changing the language.
I think it would be nice to make a "LST-aware sed". Once you have the parsers, I don't think it's that hard to add something like that on top. The parsers are of course a lot of labor!