| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ffast-math 1458 days ago

Author here. Ask me anything--happy to answer questions.

Also, if you like this kind of work, you might like what I've been building for the past year: Composer [1]. It speeds up neural net training by a lot (e.g., 7x faster for ResNet-50) [2] and, in contrast to Bolt/MADDNESS, is polished, documented code you can get working in <5min.

[1] https://github.com/mosaicml/composer

[2] https://www.mosaicml.com/blog/mosaic-resnet

4 comments

febin 1458 days ago

Thank you for your efforts. I came across your paper/code and posted it here. I was looking to find a technique to cost optimise transformer based question and answering. Presently I am using CPU and getting a GPU is too costly on AWS.

Since I use high level code I don't understand the maths completely. However, I was wondering if your techniques can be beneficial on CPUs?

If I were to use this to improve transformer based architecture what should be my approach?

link

ffast-math 1458 days ago

Thanks for posting it!

It should be possible to get large speedups on CPUs, but the trick will be gradually approximating each of the layers in the model (see my reply to sibling comment). It's not conceptually difficult, but will require a fair amount of C++ work to port the code to GPUs* for training; and it will probably go slower than dense ops on modern GPUs due to tensor cores not supporting our memory layout.

I think of this paper as the first in a two-part series, where the next one takes these fast ops and gets them working in full neural nets. (If anyone wants to do this project, happy to coadvise you / talk about it whenever; I won't have bandwidth to do it myself for the foreseeable future).

*Someone recently started doing this as part of their master's thesis: https://github.com/joennlae/halutmatmul

link

febin 1458 days ago

Thank you, I will try to take this up. What would be the best way to reach out to you?

link

ffast-math 1458 days ago

email. <my first name>@mosaicml.com

link

ta_tunestub 1458 days ago

Are you aware of any code/projects that convert something like a fully trained resnet50 model into a maddness optimized, approximate model?

An approximate but speedier resnet inference model that runs on a CPU would be useful even if it’s not quite as fast/accurate as a GPU inference model, since currently the cost to run a GPU is typically higher than CPUs.

link

ffast-math 1458 days ago

This master's thesis sort of does it for individual layers, but it doesn't have any fine-tuning yet so it completely wrecks the accuracy: https://github.com/joennlae/halutmatmul.

If someone worked on contributing this functionality to Composer [1] I'd be down to help out. I can't justify building it all on my own right now since we're 100% focused on training speedup, but I could definitely meet and talk through it, help code tricky parts, review PRs, etc.

[1] https://github.com/mosaicml/composer

link

nl 1458 days ago

There's lots of work in this area.

Quantization is a common technique. See for example https://pytorch.org/docs/stable/quantization.html

link

fragmede 1457 days ago

Not to distract from the clearly good technical work you've done here, but why name it Bolt when there's a Fintech startup with the same name. Among other brand confusion issues (Like the Chevy Bolt).

link

ffast-math 1457 days ago

I named it this in 2017 and was only worried about name collisions with other GitHub repos and ML algorithms. Also it's a backronym for Based On Lookup Tables + sounds at least somewhat evocative of going fast, so it was the best name for an algorithm I could come up with.

link

outlace 1458 days ago

I see it noted that this could speed up machine learning inference, but any hope of this being extended to also speed up training? I imagine with 100x speedup in matmuls, albeit approximate matmuls, one could plausibly train on a CPU.

link

ffast-math 1458 days ago

Yes. It's another research project to make this happen, but I think it would be fairly straightforward. The issue is that you can't backprop through the assignment step, so you get no gradient with respect to the input. This mandates a progressive layer freezing strategy. I don't think it would be too hard to get working though; you'd likely just need to train for longer, or start with a pretrained model and fine-tune it as you freeze + approximate the layers.

link

gxh8N 1458 days ago

Have you done any presentations at the big companies yet?

link

ffast-math 1458 days ago

Nope. I'd love to though.

link