Hacker News new | ask | show | jobs
by bob1029 389 days ago
I think it is too risky to build a company around the premise that someone won't soon solve the quadratic scaling issue. Especially, when that company involves creating ASICs.

E.g.: https://arxiv.org/abs/2312.00752

2 comments

Attention is not the primary inference bottleneck. For each token you have to load all of the weights (or activated weights) from memory. This is why Cerebras is fast: they have huge memory bandwidth.
Yeah also strikes me as quite risky. Their gear seems very focused on llama family specifically.

Just takes one breakthrough and it's all different. See the recent diffusion style LLMs for example