Hacker News new | ask | show | jobs
by calculito 755 days ago
I assume the question is rather which LLM can cover most of the tasks while delivering decent quality. I would prefer an architecture using different LLM for different tasks rather like 'specialists' instead of simple 'agents'. I used to take the main task and divide it in smaller tasks and see what can I use to solve the problem. Sometimes rule-based approaches can be already enough for a sub-task and LLM would be not only overkill but also more difficult to implement and maintain.
1 comments

so what is your answer to the question?
Depends of what you want to do!? Just for testing most of the 7B model are a good compromise between quality and performance (speak execution time)
Is there a way to reliably package these models with existing games and make them run locally? This would virtually make inference free right?

What I think is, from my limited understanding about this field, if smaller models can run on consumer hardware reliably and speedily that would be a game changer.

> This would virtually make inference free right?

Not really. Inference is never "free" unless you cache the result (which is just a static output) or unless you reduce complexity (which yields procedurally less-usable outputs).

Can you explain further? Why would it not be free if it's running locally
I thought you meant "free" in terms of computational cost; running local is technically free of charge, but also requires a lot of processing power. Inferecing a properly-sized LLM will potentially starve the rest of your software from GPU/CPU/memory access, so you have to plan accordingly.