Hacker News new | ask | show | jobs
by RationPhantoms 1015 days ago
How much of it is OpenAI/Microsoft curtailing the compute being used to generate responses?
1 comments

The accuracy loss is more consistent with some kind of quantization of the model(-s) behind the scenes than the alignment gone wrong. Quantization to serve more users faster, on same amount or less of compute.
Sorry, what does quantization mean here?
Reducing the precision of the weights from high precision floating points to either lower precision floats or even integers. You'd think it would greatly reduce the performance of a model, but in most cases the decline in quality is extremely tolerable compared to the reduction in memory/processing requirements.
It means using less number of bits to store float values. This reduces the memory/compute requirement at the cost of making model less precise.
Reducing the precision of the parameters — result being less memory intensive