Hacker News new | ask | show | jobs
by re-thc 3 hours ago
> The big question for local LLMs is whether there is a 100 tok/s model which requires less than 16 GB of memory and is competitive on most tasks with the cloud models.

Benchmarks maybe? Real world, no.

You just need the context otherwise. There's no way around it.