| What’s the main use-case for this? I get that I can run local models, but all the paid for (remote) models are superior. So is the use-case just for people who don’t want to use big tech’s models? Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots? Not being cynical here, just wanting to understand the genuine reasons people are using it. |
I've invested heavily in local inference. For me, it's a mixture privacy, control, stability, cognitive security.
Privacy - my agents can work on tax docs, personal letters, etc.
Control - I do inference steering with some projects: constraining which token can be generated next at any point in time. Not possible with API endpoints.
Stability - I had many bad experiences with frontier labs' inference quality shifting within the same day, likely due to quantization due to system load. Worse, they retire models, update their own system prompts, etc. They're not stable.
Cognitive Security - This has become more important as I rely more on my agents for performing administrative work. This is intermixed with the Control/Stability concerns, but the focus is on whether I can trust it to do what I intended it to do, and that it's acting on my instructions, rather than the labs'.