Is there a way to reliably package these models with existing games and make them run locally? This would virtually make inference free right?
What I think is, from my limited understanding about this field, if smaller models can run on consumer hardware reliably and speedily that would be a game changer.
Not really. Inference is never "free" unless you cache the result (which is just a static output) or unless you reduce complexity (which yields procedurally less-usable outputs).
I thought you meant "free" in terms of computational cost; running local is technically free of charge, but also requires a lot of processing power. Inferecing a properly-sized LLM will potentially starve the rest of your software from GPU/CPU/memory access, so you have to plan accordingly.