| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fnbr 937 days ago
	Ha! I got a ton of new subscribers this morning and was wondering why. Let me know if I can answer any questions (I am the author).

1 comments

ramesh1994 937 days ago

I think distillation in the original sense isn't being done anymore but finetuning on outputs from larger models like GPT-4 is a form of distillation (top-1 logit vs all logits and a curated synthetic data instead of the original dataset)

On quantization though its still weird how just the weights are quantized in methods like gptq / int8 while there are other methods which quantize the activations as well. There's also the matter of KV cache still being in original 16bit precision regardless which is also unsolved here. Do you have any thoughts or insights into this?

link

fnbr 937 days ago

It’s not clear to me what’s happening on the distillation front. I agree no one is doing it externally, but I suspect that the foundation model companies are doing it internally, performance is just too good.

There’s a bunch of recent work that quantizes the activations as well, like fp8-LM. I think that this will come. Quantization support in PyTorch is pretty experimental right now, so I think we’ll see a lot of improvements as it gets better support.

The KV cache piece is tied to the activations imo- once those start getting quantized effectively, the KV cache will follow.

link

sheikheddy 936 days ago

1) Any particular reasoning behind estimating OpenAI’s margins are 60%?

2) How much does human preference diverge from benchmark scores in your experience?

3) Do woodpeckers stop attacking houses when it’s winter in Alberta?

link

fnbr 936 days ago

1) i actually think that’s too high, i bet it’s more like 30%. My logic is that they have to have _some_ margin, but LLMs are too expensive to have typical software margins. Total speculation though.

2) It generally tracks pretty well unless the model is gaming the metric (training on the test set, overfit to the specific source of data, etc). The relative rankings will typically match in both.

3) alas, not with the mild winter North America’s having. They only stop below -5C or so. I am lucky though. The woodpecker stopped attacking my house and started attacking my neighbor’s. Even worse, it used to be a downy woodpecker,and it’s now been replaced by a pileated one (think: Woody).

link