|
|
|
|
|
by fnbr
900 days ago
|
|
It’s not clear to me what’s happening on the distillation front. I agree no one is doing it externally, but I suspect that the foundation model companies are doing it internally, performance is just too good. There’s a bunch of recent work that quantizes the activations as well, like fp8-LM. I think that this will come. Quantization support in PyTorch is pretty experimental right now, so I think we’ll see a lot of improvements as it gets better support. The KV cache piece is tied to the activations imo- once those start getting quantized effectively, the KV cache will follow. |
|
2) How much does human preference diverge from benchmark scores in your experience?
3) Do woodpeckers stop attacking houses when it’s winter in Alberta?