|
|
|
|
|
by Isuckatcode
756 days ago
|
|
Here [1] is a reference to the token/sec of Llama 3 on different apple hardware. You can evaluate if this is an acceptable performance for your agents. I would assume the token/sec would be much lower if the LLM agent is running along the side as the game would also be using a portion of the CPU and GPU. I think this is something that you need to test out on your own to determine its usability. You can also look into lower parameter models (3B for example) to determine if the balance between accuracy and performance fits under your usecase. >Is there a way to reliably package these models with existing games and make them run locally? This would virtually make inference free right? I don't have any knowledge on game dev so I can comment on this but yes, packaging it locally would make the inference free. [1] https://github.com/ggerganov/llama.cpp/discussions/4167 |
|