Hacker News new | ask | show | jobs
by bob1029 289 days ago
I think quantization is the simplest canary.

If we can reduce the precision of the model parameters by 2~32x without much perceptible drop in performance, we are clearly dealing with something wildly inefficient.

I'm open to the possibility that over parameterization is essential as part of the training process, much like how MSAA/SSAA over sample the frame buffer to reduce information aliasing in the final scaled result (also wildly inefficient but very effective generally). However, I think for more exotic architectures (spiking / time domain) these rules don't work the same way. You can't back propagate a recurrent SNN so much of the prevailing machine learning mindset doesn't even apply.

1 comments

It’s not clear that the inefficiency of the current paradigm is in the neural net architectures. It seems just as likely that it’s in the training objective.
Right. The objective is "correctly predict the entire training set", where that training set contains literally everything. So the objective becomes to speak every human language, every programming language, to understand every topic, to master every weird sub-genre of culture. That's an inherently very inefficient training objective if you just want an AI that can do some specific tasks. It's the whole insight behind models specific to summarization, text extraction, patch merging etc.

And don't forget the noise. If you look at the Anthropic papers it's clear from the examples they give that the dataset is still incredibly noisy even after extensive cleaning efforts. A lot of those parameters are being wasted trying to predict garbage outputs from HTML scraping gone wrong.