| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mfro 70 days ago
	Strangely, it is super fast on my 16 Plus, but with longer messages it can slow down a LOT, and not because of thermal throttling. I wish I could see some diagnostic data.

1 comments

steve-atx-7600 70 days ago

Inference from an LLM is O(tokens^2)

halJordan 69 days ago

Only in the naive implementations of attention