|
|
|
|
|
by maxdw101
1843 days ago
|
|
How scalable is this approach of using ASTs? Can you scale this to a single codebase with 500K+ lines? The article talks about multiple different code bases that together added up to 100K. Second, does one need to worry about getting this right in dynamically typed languages? |
|
The tooling (generally built internally on top of a couple of open source package) matters a lot here. You generally don't want to create a single pull request with hundreds of thousands of changes.
The way I've done it and seen it done, is using tools that stagger/schedule pull requests, generally across packages (libraries, apps).
You build a database of every package in your system. If they're across multiple repositories (as opposed to a monorepo) you keep track of that.
When you need to make a changeset, you first narrow down your target (By language, using Github Search, OpenGrok, SourceGraph, whatever). Then for each match, you run your AST transform. If the transform returns a non-null change, you create a pull request for each one (usually using some kind of cron job system or something clever on top of k8s)
Then teams that own the package can then review and merge them, or the system can auto-merge after tests pass or something.
You really want to go 80/20 with your AST transform at large scale (building a script that will get it 100% right in every case would take too long), then do dry runs to see where it gets it wrong. Either tweak things or handle those cases manually.
For dynamic languages, you can rely on rich AST parsers to give you more info (eg: for JavaScript you can use TypeScript to infer some info, even if the original code is not in TypeScript). You also rely heavily on tests to know if you're breaking something or not.