| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fnbr 900 days ago

It’s not clear to me what’s happening on the distillation front. I agree no one is doing it externally, but I suspect that the foundation model companies are doing it internally, performance is just too good.

There’s a bunch of recent work that quantizes the activations as well, like fp8-LM. I think that this will come. Quantization support in PyTorch is pretty experimental right now, so I think we’ll see a lot of improvements as it gets better support.

The KV cache piece is tied to the activations imo- once those start getting quantized effectively, the KV cache will follow.

1 comments

sheikheddy 900 days ago

1) Any particular reasoning behind estimating OpenAI’s margins are 60%?

2) How much does human preference diverge from benchmark scores in your experience?

3) Do woodpeckers stop attacking houses when it’s winter in Alberta?

fnbr 900 days ago

1) i actually think that’s too high, i bet it’s more like 30%. My logic is that they have to have _some_ margin, but LLMs are too expensive to have typical software margins. Total speculation though.

2) It generally tracks pretty well unless the model is gaming the metric (training on the test set, overfit to the specific source of data, etc). The relative rankings will typically match in both.

3) alas, not with the mild winter North America’s having. They only stop below -5C or so. I am lucky though. The woodpecker stopped attacking my house and started attacking my neighbor’s. Even worse, it used to be a downy woodpecker,and it’s now been replaced by a pileated one (think: Woody).