I prefer faster, dumber models because I provide the intelligence myself and I use them only for things that can be verified pretty easily; they do research (with sources) for me, do certain types of code analysis and code search, boilerplate generation, etc., so a fast model is really key.
I don't have any desire (or think it's a good use of LLMs) to one-shot features because even SotA models are incredibly bad at this. I'm optimizing for what they actually seem to be able to do reliably and pretty well, and I want those things to be done fast so I can get on with things.
Generally thinking tokens are the ones which are verbose. So the speed helps with reducing time for thinking tokens generations and you get your actual output code very fast.
I don't have any desire (or think it's a good use of LLMs) to one-shot features because even SotA models are incredibly bad at this. I'm optimizing for what they actually seem to be able to do reliably and pretty well, and I want those things to be done fast so I can get on with things.