| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by quest88 252 days ago
	What do you mean by properly? What’s the behavior one would observe if they did run an llm?

2 comments

burnte 252 days ago

"Properly" means at some arbitrary speed that the writer would describe as "fast" or "fast enough". If you have a lower demand for speed they'll run fine.

link

nik736 252 days ago

If you have enough memory to load a model, but not enough bandwidth to handle it, you will get a very low token/s output.

link

Rohansi 252 days ago

You can also have enough bandwidth but be compute limited and get lower performance than expected. This is more likely to be the case for Apple Silicon vs. high power GPUs.

link