The balancing act is that the fancier your sanity check, the greater the chance of something slipping through its cracks too. Walking too strings in parallel is very simple and hard to get wrong. Traversing an AST and skipping a branch is exactly the kind of easy-to-make bug that the sanity check is designed to catch.
What I'd like to do is something somewhere in the middle where I walk the token stream and check that every token of the input ended up in the output, but I haven't figured out a simple and fast way to do that yet. Performance is particularly tricky because I obviously don't want to burn a bunch of CPU cycles on a sanity check that exists only to catch bugs.
I've always thought it would make sense for formatters to be baked into the toolchain so that they can reuse the language's parser (presumably exposed as a library) and then be implemented via parsing to AST and then formatted back out so that they're guaranteed to be correct and normalized. This doesn't seem to be how most formatters work in practice though, although I'm not sure if it's because of performance reasons or a lack of support for the parser being exposed in language toolchains.
Good point, I hadn't really thought about it, but the name makes it pretty clear it's using clang's tooling. I only have worked a small amount in C++ in my career years back ago, but I distinctly remember feeling like clang-format was essentially perfect from my perspective, so it's nice to know that my abstract ideals bear out in practice.
What I'd like to do is something somewhere in the middle where I walk the token stream and check that every token of the input ended up in the output, but I haven't figured out a simple and fast way to do that yet. Performance is particularly tricky because I obviously don't want to burn a bunch of CPU cycles on a sanity check that exists only to catch bugs.