Hacker News new | ask | show | jobs
by grungegun 815 days ago
So there's no performance gain for quantization enabled by the transformer architecture? It seems very strange that quantization works so well since in most of my experiments, the internal model weights of mlps look random.