Hacker News new | ask | show | jobs
by ta_tunestub 1463 days ago
Are you aware of any code/projects that convert something like a fully trained resnet50 model into a maddness optimized, approximate model?

An approximate but speedier resnet inference model that runs on a CPU would be useful even if it’s not quite as fast/accurate as a GPU inference model, since currently the cost to run a GPU is typically higher than CPUs.

3 comments

This master's thesis sort of does it for individual layers, but it doesn't have any fine-tuning yet so it completely wrecks the accuracy: https://github.com/joennlae/halutmatmul.

If someone worked on contributing this functionality to Composer [1] I'd be down to help out. I can't justify building it all on my own right now since we're 100% focused on training speedup, but I could definitely meet and talk through it, help code tricky parts, review PRs, etc.

[1] https://github.com/mosaicml/composer

There's lots of work in this area.

Quantization is a common technique. See for example https://pytorch.org/docs/stable/quantization.html

Not to distract from the clearly good technical work you've done here, but why name it Bolt when there's a Fintech startup with the same name. Among other brand confusion issues (Like the Chevy Bolt).
I named it this in 2017 and was only worried about name collisions with other GitHub repos and ML algorithms. Also it's a backronym for Based On Lookup Tables + sounds at least somewhat evocative of going fast, so it was the best name for an algorithm I could come up with.