Hacker News new | ask | show | jobs
by deepsquirrelnet 1139 days ago
A CD doesn’t work as an analogy. Think about it this way — if you build a model and don’t train it at all, it will still have the same number of parameters and take up the same amount of disk space.

We’re finding out that many models are undertrained for their sizes, and a good option is to post process them into smaller models by teaching a smaller model to mimic their output. Quantization effectively cuts down the model size as well. No loss in quality means that the model has not been trained enough to take advantage of the depth of precision that is available.