I want to use local LLMs, and in fact I have enough VRAM (12GB) and RAM (96GB) to do it but I gave up because it was pretty buggy with the Gemma 4 26B (A4B?) Q4 models. It also meant I had to give up local voice transcription because I needed all my VRAM dedicated to the LLM.
The other thing is I will ask an agent via Telegram to code stuff, so I want an agent that is smart enough to do it all. I prefer brute forcing with money right now. I hate when LLM make bizarre mistakes, I end up spending way too much time figuring out the issue.
I use Openrouter, so hopefully no one has built a perfect replica of me in their storage. I flip between models too.
But to be clear, I am living dangerously with agentic workflows in general. Haven't been burnt yet (other than accidentally running up a huge Gemini bill which made me switch to Codex Oauth and Openrouter for cheap Minimax 2.7)
I am moving to a commander/orchestrator model to use both frontier and cheap models and eventually a better local LLM once I buy a 5070 Ti, 3090, 64GB Mac M1 Max, 128GB Strix Halo (probably missed that train) or the AMD R9700.
The other thing is I will ask an agent via Telegram to code stuff, so I want an agent that is smart enough to do it all. I prefer brute forcing with money right now. I hate when LLM make bizarre mistakes, I end up spending way too much time figuring out the issue.
I use Openrouter, so hopefully no one has built a perfect replica of me in their storage. I flip between models too.
But to be clear, I am living dangerously with agentic workflows in general. Haven't been burnt yet (other than accidentally running up a huge Gemini bill which made me switch to Codex Oauth and Openrouter for cheap Minimax 2.7)
I am moving to a commander/orchestrator model to use both frontier and cheap models and eventually a better local LLM once I buy a 5070 Ti, 3090, 64GB Mac M1 Max, 128GB Strix Halo (probably missed that train) or the AMD R9700.