| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by BoorishBears 53 days ago
	I like the technique described here around distillation to recover from quantization, but I don't understand why we keep performing lossy compression on LLMs then using benchmarks that were nearly saturated before post-training to measure the effects. You could erase the gains from literally half the compute going into some of these recent models and barely make a dent in MMLU-Pro and GPQA-D.