| Not sure if I would tradeoff speed for accuracy. Yes, it's incredible boring to wait for the AI Agents in IDEs to finish their job. I get distracted and open YouTube. Once I gave a prompt so big and complex to Cline it spent 2 straight hours writing code. But after these 2 hours I spent 16 more tweaking and fixing all the stuff that wasn't working. I now realize I should have done things incrementally even when I have a pretty good idea of the final picture. I've been more and more only using the "thinking" models of o3 in ChatGPT, and Gemini / Claude in IDEs. They're slower, but usually get it right. But at the same time I am open to the idea that speed can unlock new ways of using the tooling. It would still be awesome to basically just have a conversation with my IDE while I am manually testing the app. Or combine really fast models like this one with a "thinking background" one, that would runs for seconds/minutes but try to catch the bugs left behind. I guess only giving a try will tell. |
Think of the old example where an auto regressive model would output: "There are 2 possibilities.." before it really enumerated them. Often the model has trouble overcoming the bias and will hallucinate a response to fit the proceeding tokens.
Chain of thought and other approaches help overcome this and other issues by incentivizing validation, etc.
With diffusion however it is easier for the other generated answer to change that set of tokens to match the actual number of possibilities enumerated.
This is why I think you'll see diffusion models be able to do some more advanced problem solving with a smaller number of "thinking" tokens.