Hacker News new | ask | show | jobs
by embedding-shape 106 days ago
> which is much more the case for local inference than sending it away over a network

Of course, but that isn't what unclear here.

What's unclear is why a 7b LLM model would be better for those things than say a 14b model, as the difference will be minuscule, yet parent somehow made the claim they make more sense for verification because somehow latency is more important than accuracy.