Y
Hacker News
new
|
ask
|
show
|
jobs
by
mfro
70 days ago
Strangely, it is super fast on my 16 Plus, but with longer messages it can slow down a LOT, and not because of thermal throttling. I wish I could see some diagnostic data.
1 comments
steve-atx-7600
70 days ago
Inference from an LLM is O(tokens^2)
link
halJordan
69 days ago
Only in the naive implementations of attention
link