Hacker News new | ask | show | jobs
by bob1029 475 days ago
> The question is if a llm will run with usable performance at that scale?

For the self-attention mechanism, memory bandwidth requirements scale ~quadratically with the sequence length.

1 comments

Someone has got to be working on a better method than that. Hundreds of billions are at stake.