Hacker News new | ask | show | jobs
by calculito 754 days ago
Depends of what you want to do!? Just for testing most of the 7B model are a good compromise between quality and performance (speak execution time)
1 comments

Is there a way to reliably package these models with existing games and make them run locally? This would virtually make inference free right?

What I think is, from my limited understanding about this field, if smaller models can run on consumer hardware reliably and speedily that would be a game changer.

> This would virtually make inference free right?

Not really. Inference is never "free" unless you cache the result (which is just a static output) or unless you reduce complexity (which yields procedurally less-usable outputs).

Can you explain further? Why would it not be free if it's running locally
I thought you meant "free" in terms of computational cost; running local is technically free of charge, but also requires a lot of processing power. Inferecing a properly-sized LLM will potentially starve the rest of your software from GPU/CPU/memory access, so you have to plan accordingly.