Hacker News new | ask | show | jobs
by steve-atx-7600 70 days ago
Inference from an LLM is O(tokens^2)
1 comments

Only in the naive implementations of attention