Wouldn't it be possible to trade speed back for accuracy, e.g. by asking the model to look at a problem from different angles, let it criticize its own output, etc.?
Just have it sample for longer or create a simple workflow that uses a monte carlo tree search approach. Don't see why this wont improve accuracy. I would love to see someone run tests to see how accurate the model is compared to similar parameter models in a per time block benchmark. Like if it can get same accuracy as a similar parameter autoregressive model but with half the speed, you already have a winner, besides other advantages of a diffusion based model.