|
|
|
|
|
by ljosifov
7 days ago
|
|
Not replaced but supplemented. For off-line coding current setup is pi + ds4-server + DeepSeek-V4-Flash REAP25 (on M2 Max 96gb). For simpler programming related (e.g. text2sql) as well as synthetic data generation, current best for me is llama.cpp + Gemma-4-26B-A4B (on gpu 7900xtx 24gb; sometimes nemotron-cascade-2-30b-a3b for 1M context). That and (dabbling now) auto-research uses lots of tokens. Used to get paused running out of token quotas all the time. The 1st local model I found somewhat useful to me was glm-4.7-flash, and it's gotten way better since. Recently between OpenCode Go choice of models at many price points, and DeepSeek-V4 dropping the IQ/$$$ by multiples, have become less reliant on local llms for this auxiliary work. Claude I use but with Zai GLM-5.2 subscription. And maintain GPT subscription for quality models. |
|