Hacker News new | ask | show | jobs
by photonbucket 816 days ago
Is there any tooling which can tell you exactly which parts of a crate that you actually use and produce a minimized version for vendoring/auditing?
4 comments

I like this idea. Theoretically, the compiler already has the machinery to remove dead code. Next step could package up just the source you touch.
it's not trivial to do if you have multiple build targets and features

i.e. you would need to vendor one version for each features x target tripple combination combined with cfg expansion and (proc) macro expansion inlining and then a static reachability analysis to prune all unused code (and dependencies). That would likely not be good enough so you probably need to have some runtime code coverage analysis to find "likely dead code" (but not statically provable dead code) and then manual choices to keep/remove combined with some bisecting/testing to make sure the choices are sane.

Afik such tool doesn't exist.

And it's non trivial.

But it's also very viable to create it.

You can get that info from code coverage, via `cargo llvm-cov` etc, though that would require exercising all code paths into the deps or else you might underestimate how much of the deps you need to vendor. But at least if you underestimate in this way, you'll probably just get a compiler error rather than anything breaking at runtime.
I have been spitballing about this recently too [1]. The way I'd imagine it would work is the toolchain takes one pass over your crate, compiles everything, then takes another pass to trim all the dead code from your vendored deps. Then your git diff basically has your code + all the lines of all your deps that didn't get trimmed.

There would probably need to be some more work to make it more user friendly, but I think it's really important that all the code which ultimately ends up in your binary goes in the diff otherwise reviewers won't actually look at it.

Disclaimer: I don't know enough about compilers, or the Rust toolchain specifically, to know if this is even possible or whether it would actually help anyone in the real world. But it seems "naively reasonable" for some definition.

[1] https://news.ycombinator.com/item?id=39828499

This is commonly called the "tree shaking" [1] which is a particular mode of the general dead code elimination. One of main challenges would be the reproduction of somehow readable source code after the tree shaking.

[1] https://en.wikipedia.org/wiki/Tree_shaking